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PREFACE 


m ——————— — ———————EOOSee 


My purpose in this book is to treat linear transformations on finite- 
dimensional vector spaces by the methods of more general theories. The 
idea is to emphasize the simple geometric notions common to many parts 
of mathematics and its applications, and to do so in a language that gives 
away the trade secrets and tells the student what is in the back of the minds 
of people proving theorems about integral equations and Hilbert spaces. 
The reader does not, however, have to share my prejudiced motivation. 
Except for an occasional reference to undergraduate mathematics the book 
is self-contained and may be read by anyone who is trying to get a feeling 
for the linear problems usually. discussed in courses on matrix theory or 
“higher” algebra. The algebraic, coordinate-free methods do not lose power 
and elegance by specialization to a finite number of dimensions, and they 
are, in my belief, as elementary as the classical coordinatized treatment. 

I originally intended this book to contain a theorem if and only if an 
infinite-dimensional generalization of it already exists. The tempting 
easiness of some essentially finite-dimensional notions and results was, 
however, irresistible, and in the final result my initial intentions are just 
barely visible. They are most clearly seen in the emphasis, throughout, on 
generalizable methods instead of sharpest possible results. The reader may 
sometimes see some obvious way of shortening the proofs I give. In such 
cases the chances are that the infinite-dimensional analogue of the shorter 
proof is either much longer or else non-existent. 

A preliminary edition of the book (Annals of Mathematics Studies, 
Number 7, first published by the Princeton University Press in 1942) has 
been circulating for several years. In addition to some minor changes in 
style and in order, the difference between the preceding version and this 
one is that the latter contains the following new material: (1) A brief dis- 
cussion of fields, and, in the treatment of vector spaces with inner products, 
special attention to the real case. (2) A definition of determinants in 
invariant terms, via the theory of multilinear forms. (3) Exercises. 

The exercises (well over three hundred of them) constitute the most 
significant addition; I hope that they will be found useful by both student 
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and teacher. There are two things about them the reader should know. 
First, if an exercise is neither imperative (“prove that . . .”) nor interroga- 
tive (“is it true that . . . ?”) but merely declarative, then it is intended 
as a challenge. For such exercises the reader is asked to discover if the 
assertion is true or false, prove it if true and construct a counterexample if 
false, and, most important of all, discuss such alterations of hypothesis and 
conclusion as will make the true ones false and the false ones true. Second, 
the exercises, whatever their grammatical form, are not always placed so 
as to make their very position a hint to their solution. Frequently exer- 
cises are stated as soon as the statement makes sense, quite a bit before 
machinery for a quick solution has been developed. A reader who tries 
(even unsuccessfully) to solve such a “misplaced” exercise is likely to ap- 
preciate and to understand the subsequent developments much better for 
his attempt. Having in mind possible future editions of the book, I ask 
the reader to let me know about errors in the exercises, and to suggest im- 
provements and additions. (Needless to say, the same goes for the text.) 

None of the theorems and only very few of the exercises are my discovery ; 
most of them are known to most working mathematicians, and have been 
known for along time. Although I do not give a detailed list of my sources, 
I am nevertheless deeply aware of my indebtedness to the books and papers 
from which I learned and to the friends and strangers who, before and 
after the publication of the first version, gave me much valuable encourage- 
ment and criticism. I am particularly grateful to three men: J. L. Doob 
and Arlen Brown, who read the entire manuscript of the first and the 
second version, respectively, and made many useful suggestions, and 
John von Neumann, who was one of the originators of the modern spirit 
and methods that I have tried to present and whose teaching was the 


inspiration for this book. 
P. R. H. 
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CHAPTER I 


SPACES 


$ 1. Fields 


In what follows we shall have occasion to use various classes of numbers 
(such as the class of all real numbers or the class of all complex numbers). 
Because we should not, at this early stage, commit ourselves to any specific 
class, we shall adopt the dodge of referring to numbers as scalars. The 
reader will not lose anything essential if he consistently interprets scalars 
as real numbers or as complex numbers; in the examples that we shall 
study both classes will occur. To be specific (and also in order to operate 
at the proper level of generality) we proceed to list all the general facts 
about scalars that we shall need to assume. 


(A) To every pair, a and £, of scalars there corresponds a scalar a + 8, 
called the sum of a and £, in such a way that ” 

(1) addition is commutative, a + 8 = 8 +a, 

(2) addition is associative, a + (8 + yY) = (a+ 8) + y, 

(3) there exists a unique scalar 0 (called zero) such that a + 0 = a for 
every scalar a, and 

(4) to every scalar a there corresponds a unique scalar —a such that 
a + (—a) = 0. 


(B) To every pair, a and £, of scalars there corresponds a scalar af, 
called the product of a and £, in such a way that 

(1) multiplication is commutative, aß = Ba, 

(2) multiplication is associative, a(8y) = (aß)y, 

(3) there exists a unique non-zero scalar 1 (called one) such that al = a 
for every scalar a, and 

(4) to every non-zero scalar œ there corresponds a unique scalar a 


1 
(or -) such that aa? = 1, 
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(C) Multiplication is distributive with respect to addition, a(8 + 7) 
= aß + ay. 


If addition and multiplication are defined within some set of objects 
(scalars) so that the conditions (A), (B), and (C) are satisfied, then that 
set (together with the given operations) is called a field. Thus, for example, 
the set Q of all rational numbers (with the ordinary definitions of sum 
and product) is a field, and the same is true of the set ® of all real numbers 
and the set © of all complex numbers. 


EXERCISES 


1. Almost all the laws of elementary arithmetic are consequences of the axioms 
defining a field. Prove, in particular, that if F is a field, and if æ, 8, and y belong 
to §, then the following relations hold. 

(a) 0+a =a. 

b) Ifa +8 =a + y, then 8 = y. 

(c) a + (B —a) =B. (Here — a = B + (—a).) 

(d) a-0 = 0-æ = 0. (For clarity or emphasis we sometimes use the dot to indi- 
cate multiplication.) 

(e) (— Ia = —a. 

(f) (—a)(—8) = of. 

(g) If aß = 0, then either a = 0 or 8 = 0 (or both). 


2. (a) Is the set of all positive integers a field? (In familiar systems, such as the 
integers, we shall almost always use the ordinary operations of addition and multi- 
plication. On the rare occasions when we depart from this convention, we shall 
give ample warning. As for “positive,” by that word we mean, here and elsewhere 
in this book, “greater than or equal to zero.” If 0 is to be excluded, we shall say 
“strictly positive.’’) 

(b) What about the set of all integers? 

(c) Can the answers to these questions be changed by re-defining addition or 
multiplication (or both)? 


3. Let m be an integer, m = 2, and let Z,, be the set of all positive integers less 
than m, Zm = {0, 1, ---, m—1}. Ifa and £ are in Zm, let a + £ be the least 
positive remainder obtained by dividing the (ordinary) sum of a and 6 by m, and, 
similarly, let a8 be the least positive remainder obtained by dividing the (ordinary) 
product of æ and 8 by m. (Example: if m = 12, then 3 + 11 = 2 and 3-11 = 9.) 

(a) Prove that Zm is a field if and only if m is a prime. 

(b) What is —1 in Zs? 

(c) What is $ in Z7? 


4. The example of Zp (where p is a prime) shows that not quite all the laws of 
elementary arithmetic hold in fields; in Zs, for instance, 1 + 1 = 0. Prove that 
if ¥ is a field, then either the result of repeatedly adding 1 to itself is always dif- 
ferent. from 0, or else the first time that it is equal to 0 occurs when the number 
of summands is a prime. (The characteristic of the field F is defined to be 0 in the 
first case and the crucial prime in the second.) 
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5. Let QO(V/Z) be the set of all real numbers of the form a + 8 vZ, where 
a and ĝ are rational. 

(a) Is Q(\/2) a field? 

(b) What if œ and $ are required to be integers? 


6. (a) Does the set of all polynomials with integer coefficients form a field? 
(b) What if the coefficients are allowed to be real numbers? 


7. Let F be the set of all (ordered) pairs (a, 8) of real numbers. © 
(a) If addition and multiplication are defined by 


(a, B) + (y, ô) = (a + 7, B + 4) 
(a, B)y, ô) = (æy, 88), 
does ¥ become a field? 
(b) If addition and multiplication are defined by 
(a, B) + (Y, 6) = (a+ y, B + ô) 
(a, By, ô) = (æy — Bb, aô + By), 
is F a field then? 


(c) What happens (in both the preceding cases) if we consider ordered pairs of 
complex numbers instead? 


and 


and 


§2. Vector spaces 


We come now to the basic concept of this book. For the definition 
that follows we assume that we are given a particular field $; the scalars 
to be used are to be elements of F. 


Derinition. A vector space is a set U of elements called vectors satisfying 
the following axioms. 


(A) To every pair, x and y, of vectors in U there corresponds a vector 
z + y, called the sum of x and y, in such a way that 

(1) addition is commutative, £ + y = y + z, 

(2) addition is associative, z + (y + 2) = (£ + y) + Z, 

(3) there exists in 0 a unique vector 0 (called the origin) such that 
z +0 = z for every vector z, and 

(4) to every vector z in U there corresponds a unique vector —zx such 


that z + (—2) = 0. 


(B) To every pair, æ and z, where a is a scalar and z is a vector in V, 
there corresponds a vector az in V, called the product of a and z, in such 
a way that 

(1) multiplication by scalars is associative, a(@r) = (o8)z, and 

(2) 12 = x for every vector z. 


4 SPACES Szo. 3 


(C) (1) Multiplication by scalars is distributive with respect to vector 
addition, e(z + y) = ax + ay, and 

(2) multiplication by vectors is distributive with respect to scalar ad- 
dition, (a + 8)z = az + Bx. 


These axioms are not claimed to be logically independent; they are 
merely a convenient characterization of the objects we wish to study. The 
relation between a vector space U and the underlying field § is usually 
described by saying that U is a vector space over F. If F is the field R 
of real numbers, © is called a real vector space; similarly if F is Q or if F 
is €, we speak of rational vector spaces or complex vector spaces. 


§ 3. Examples 


Before discussing the implications of the axioms, we give some examples. 
We shall refer to these examples over and over again, and we shall use the 
notation established here throughout the rest of our work. 

(1) Let e'(= @) be the set of all complex numbers; if we interpret 
z + y and ax as ordinary complex numerical addition and multiplication, 
C! becomes a complex vector space. 

(2) Let © be the set of all polynomials, with complex coefficients, in a 
variable t. To make @ into a complex vector space, we interpret vector 
addition and scalar multiplication as the ordinary addition of two poly- 
nomials and the multiplication of a polynomial by a complex number; 
the origin in @ is the polynomial! identically zero. 

Example (1) is too simple and example (2) is too complicated to be 
typical of the main contents of this book. We give now another example 
of complex vector spaces which (as we shall see later) is general enough for 
all our purposes. 

(3) Let C”, n = 1, 2, ---, be the set of all n-tuples of complex numbers. 
If x = (&, --+, En) and y = (m, ---, mn) are elements of ©”, we write, by 
definition, 

t+y= (& + 1, ++, Ën En), 


a= (afi, "t afn), 
0 = (0, ---, 0), 
m2 = (—by +++) En). 


It is easy to verify that all parts of our axioms (A), (B), and (C), § 2, are 
satisfied, so that C” is a complex vector space; it will be called n-dimensional 
complex coordinate space. 


Sec. 4 COMMENTS 5 


(4) For each positive integer n, let @, be the set of all polynomials 
(with complex coefficients, as in example (2)) of degree Sn — 1, together 
with the polynomial identically zero. (In the usual discussion of degree, 
the degree of this polynomial is not defined, so that we cannot say that it 
has degree <n — 1.) With the same interpretation of the linear operations 
(addition and scalar multiplication) as in (2), Pan is a complex vector space. 

(5) A close relative of €” is the set Q” of all n-tuples of real numbers. 
With the same formal definitions of addition and scalar multiplication as 
for C”, except that now we consider only real scalars a, the space Q” is 
a real vector space; it will be called n-dimensional real coordinate space. 

(6) All the preceding examples can be generalized. Thus, for instance, 
an obvious generalization of (1) can be described by saying that every 
field may be regarded as a vector space over itself. A common generaliza- 
tion of (3) and (5) starts with an arbitrary field $ and forms the set $” 
of n-tuples of elements of $F; the formal definitions of the linear operations 
are the same as for the case F = ©. 

(7) A field, by definition, has at least two elements; a vector space, 
however, may have only one. Since every vector space contains an origin, 
there is essentially (i.e., except for notation) only one vector space having 
only one vector. This most trivial vector space will be denoted by ©. 

(8) If, in the set & of all real numbers, addition is defined as usual and 
multiplication of a real number by a rational number is defined as usual, 
then & becomes a rational vector space. 

(9) If, in the set C of all complex numbers, addition is defined as usual 
and multiplication of a complex number by a real number is defined as 
usual, then © becomes a real vector space. (Compare this example with 
(1); they are quite different.) 


§ 4. Comments 


A few comments are in order on our axioms and notation. There are 
striking similarities (and equally striking differences) between the axioms 
for a field and the axioms for a vector space over a field. In both cases, 
the axioms (A) describe the additive structure of the system, the axioms 
(B) describe its multiplicative structure, and the axioms (C) describe the 
connection between the two structures. Those familiar with algebraic 
terminology will have recognized the axioms (A) Gn both § 1 and § 2) as 
the defining conditions of an abelian (commutative) group; the axioms (B) 
and (C) (in § 2) express the fact that the group admits scalars as operators. 
We mention in passing that if the scalars are elements of a ring (instead 
of a field), the generalized concept corresponding to a vector space is 
called a module. 
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Special real vector spaces (such as R? and &?) are familiar in geometry. 
There seems at this stage to be no excuse for our apparently uninteresting 
insistence on fields other than &, and, in particular, on the field € of complex 
numbers. We hope that the reader is willing to take it on faith that we 
shall have to make use of deep properties of complex numbers later (con- 
jugation, algebraic closure), and that in both the applications of vector 
spaces to modern (quantum mechanical) physics and the mathematical 
generalization of our results to Hilbert space, complex numbers play an 
important role. Their one great disadvantage is the difficulty of drawing 
pictures; the ordinary picture (Argand diagram) of @’ is indistinguishable 
from that of @?, and a graphic representation of ©? seems to be out of human 
reach. On the occasions when we have to use pictorial language we shall 
therefore use the terminology of ®” in C”, and speak of ©”, for example, 
as a plane. 

Finally we comment on notation. We observe that the symbol 0 has 
been used in two meanings: once as a scalar and once as a vector. To make 
the situation worse, we shall later, when we introduce linear functionals 
and linear transformations, give it still other meanings. Fortunately the 
relations among the various interpretations of 0 are such that, after this 
word of warning, no confusion should arise from this practice. 


EXERCISES 


1. Prove that if z and y are vectors and if æ is a scalar, then the following rela- 
tions hold. 

(a) O+ 2 = 2. 

(b) —0 = 0. 

(ec) a-0 = 0. 

(d) 0-2 = 0. (Observe that the same symbol is used on both sides of this equa- 
tion; on the left it denotes 4 scalar, on the right it denotes a vector.) 

(e) If az = 0, then either œ = 0 or z = 0 (or both). 

(f) -z = (- Iz. 

(œ) y + (z — y) = z. (Here z — y = z + (—y).) 


2. If p is a prime, then Z," is a vector space over Z, (cf. § 1, Ex. 3); how many 
vectors are there in this vector space? 


3. Let U be the set of all (ordered) pairs of real numbers. If x = (1, &) and 
y = (m, 72) are elements of U, write 


z +y = (+m, & +m) 
ax = (ak, 0) 
0 = (0, 0) 
=z = (—f, —&). 
Is U a vector space with respect to these definitions of the linear operations? Why? 
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4, Sometimes a subset of a vector space is itself a vector space (with respect to 
the linear operations already given). Consider, for example, the vector space C? 
and the subsets U of C? consisting of those vectors (£1, £s, £3) for which 

(a) $ is real, 

(b) 1 = 0, 

(c) either & = 0 or & = 0, 

(d) £1 + & = 0, 

e) &+t&=1L 
In which of these cases is U a vector space? 


5. Consider the vector space @ and the subsets U of © consisting of those vectors 
(polynomials) z for which 

(a) x has degree 3, 

(b) 22(0) = 2(1), 

(c) z(t) = 0 whenever 0 SiS 1, 

(da) 2( = z(1 — t) for allt. 
In which of these cases is U a vector space? 


§5. Linear dependence 


Now that we have described the spaces we shall work with, we must 
specify the relations among the elements of those spaces that will be of 
interest to us. 

We begin with a few words about the summation notation. If cor- 
responding to each of a set of indices 7 there is given a vector z;, and if it 
is not necessary or not convenient to specify the set of indices exactly, 
we shall simply speak of a set {z;} of vectors. (We admit the possibility 
that the same vector corresponds to two distinct indices. In all honesty, 
therefore, it should be stated that what is important is not which vectors 
appear in {z,;}, but how they appear.) If the index-set under consideration 
is finite, we shall denote the sum of the corresponding vectors by Dt Ti 
(or, when desirable, by a more explicit symbol such as $`7-; 2,). In order 
to avoid frequent and fussy case distinctions, it is a good idea to admit 
into the general theory sums such as J; x; even when there are no indices 
t to be summed over, or, more precisely, even when the index-set under 
consideration is empty. (In that case, of course, there are no vectors to 
sum, or, more precisely, the set {z;} is also empty.) The value of such 
an “empty sum” is defined, naturally enough, to be the vector 0. 


Dezrinrrion. A finite set {z;} of vectors is linearly dependent if there 
exists a corresponding set {a;} of scalars, not all zero, such that 


Doi az; = 0. 


If, on the other hand, >>; aiz; = 0 implies that a; = O for each à, the 
set {z;} is linearly independent. 
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The wording of this definition is intended to cover the case of the empty 
set; the result in that case, though possibly paradoxical, dovetails very 
satisfactorily with the rest of the theory. The result is that the empty 
set of vectors is linearly independent. Indeed, if there are no indices 2, 
then it is not possible to pick out some of them and to assign to the selected 
ones a non-zero scalar so as to make a certain sum vanish. The trouble 
is not in avoiding the assignment of zero; it is in finding an index to which 
something can be assigned. Note that this argument shows that the 
empty set is not linearly dependent; for the reader not acquainted with 
arguing by “vacuous implication,” the equivalence of the definition of 
linear independence with the straightforward negation of the definition 
of linear dependence needs a little additional intuitive justification. The 
easiest way to feel comfortable about the assertion “Ss az; = 0 implies 
that a; = 0 for each 7,” in case there are no indices 7, is to rephrase it this 
way: “if >>; a; = 0, then there is no index ¢ for which a; = 0.” This 
version is obviously true if there is no index ¢ at all. 

Linear dependence and independence are properties of sets of vectors; 
it is customary, however, to apply the adjectives to vectors themselves, 
and thus we shall sometimes say “a set of linearly independent vectors” 
instead of “a linearly independent set of vectors.” It will be convenient 
also to speak of the linear dependence and independence of a not necessarily 
finite set, X, of vectors. We shall say that X is linearly independent if 
every finite subset of X is such; otherwise X is linearly dependent. 

To gain insight into the meaning of linear dependence, let us study the 
examples of vector spaces that we already have. 

(1) If x and y are any two vectors in @!, then z and y form a linearly 
dependent set. If + = y = 0, this is trivial; if not, then we have, for 
example, the relation yz + (—x)y = 0. Since it is clear that every set 
containing a linearly dependent subset is itself linearly dependent, this 
shows that in C! every set containing more than one element is a linearly 
dependent set. 

(2) More interesting is the situation in the space ©. The vectors z, y, 
and z, defined by 


z(t) => 1-4, 
y@) = il — 4), 
z(t) = 1 — č, 


are, for example, linearly dependent, since z + y — z = 0. However, the 
infinite set of vectors Zo, £1, £2, ***, defined by 


zo(t) =1, x(t) =t, z(t) = ?, FEY 
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is a linearly independent set, for if we had any relation of the form 
aoto + atı +--+ + ant, = 0, 
then we should have a polynomial identity 
a + at +++ ant” = 0, 
whence a = a, == a, = 0. 


(3) As we mentioned before, the spaces C” are the prototype of what 
we want to study; let us examine, for example, the case n = 3. To those 
familiar with higher-dimensional geometry, the notion of linear dependence 
in this space (or, more properly speaking, in its real analogue ®*) has a 
concrete geometric meaning, which we shall only mention. In geometrical 
language, two vectors are linearly dependent if and only if they are col- 
linear with the origin, and three vectors are linearly dependent if and 
only if they are coplanar with the origin. (If one thinks of a vector not 
as a point in a space but as an arrow pointing from the origin to some given 
point, the preceding sentence should be modified by crossing out the phrase 
“with the origin” both times that it occurs.) We shall presently introduce 
the notion of linear manifolds (or vector subspaces) in a vector space, and, 
in that connection, we shall occasionally use the language suggested by 
such geometrical considerations. 


§ 6. Linear combinations 


We shall say, whenever z = $; o;z; that z is a linear combination of 
{x;}; we shall use without any further explanation all the simple gram- 
matical implications of this terminology. Thus we shall say, in case z 
is a linear combination of {z;}, that z is linearly dependent on {z;}; we 
shall leave to the reader the proof that if {z,;} is linearly independent, 
then a necessary and sufficient condition that z be a linear combination 
of {z;} is that the enlarged set, obtained by adjoining x to {z;}, be linearly 
dependent. Note that, in accordance with the definition of an empty 
sum, the origin is a linear combination of the empty set of vectors ; it is, 
moreover, the only vector with this property. 

The following theorem is the fundamental result concerning linear 
dependence. 


Tueorem. The set of non-zero vectors 2, -- +, Ln îs linearly dependent 
tf and only if some Tz, 2 S k < n, is a linear combination of the preceding 
ones. 


PROOF. Let us suppose that the vectors 1, ---, £n are linearly dependent, 
and let k be the first integer between 2 and n for which 21, - - -, 2, are linearly 
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dependent. (If worse comes to worst, our assumption assures us that 
k = nwilldo.) Then 
atı +--+ apt, = 0 


for a suitable set of o’s (not all zero); moreover, whatever the œ’s, we can- 
not have a, = 0, for then we should have a linear dependence relation 
among 21, ***, Z,-1, contrary to the definition of k. Hence 


aki 


=o 
te = — 2 tt 
Qt QE 


Tk—1? 


as was to be proved. This proves the necessity of our condition; sufficiency 
is clear since, as we remarked before, every set containing a linearly de- 
pendent set is itself such. 


§ 7. Bases 


DEFINITION. A (linear) basis (or a coordinate system) in a vector space 
Y is a set X of linearly independent vectors such that every vector in 
% is a linear combination of elements of X. A vector space V is finite- 
dimensional if it has a finite basis. 


Except for the occasional consideration of examples we shall restrict 
our attention, throughout this book, to finite-dimensional vector spaces. 

For examples of bases we turn again to the spaces ® and C€”. In @, 
the set {£n}, where z,(t) = t, n =0, 1, 2, ---, is a basis; every poly- 
nomial is, by definition, a linear combination of a finite number of Zn. 
Moreover & has no finite basis, for, given any finite set of polynomials, 
we can find a polynomial of higher degree than any of them; this latter 
polynomial is obviously not a linear combination of the former ones. 

An example of a basis in C” is the set of vectors z;, 7 = 1, ---, n, defined 
by the condition that the j-th coordinate of z; is ô. (Here we use for 
the first time the popular Kronecker ô; it is defined by 6,; = 1 if è = j and 
ô; = 0 if ij.) Thus we assert that in C? the vectors zı = (1, 0, 0), 
zz = (0, 1, 0), and z3 = (0, 0, 1) form a basis. It is easy to see that they 
are linearly independent; the formula 


x = ($i, fo, &3) = E101 + Sore + bars 


proves that every x in Œ is a linear combination of them. 
In a general finite-dimensional vector space U, with basis {z1, ---, Xn}, 
we know that every x can be written in the form 


w= Do bts; 
we assert that the #’s are uniquely determined by x. The proof of this 
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assertion is an argument often used in the theory of linear dependence. 
If we had z = J; 7,2, then we should have, by subtraction, 


Di (E: — nz; = 0. 


Since the x; are linearly independent, this implies that ; — n; = 0 for 
i = 1, +++, n; in other words, the #’s are the same as the »’s. (Observe 
that writing {x1, ---, £n} for a basis with n elements is not the proper thing 
to do in case n = 0. We shall, nevertheless, frequently use this notation. 
Whenever that is done, it is, in principle, necessary to adjoin a separate 
discussion designed to cover the vector space 0. In fact, however, every- 
thing about that space is so trivial that the details are not worth writing 
down, and we shall omit them.) 


TurorEem. If V is a finite-dimensional vector space and if {y1, -++, Ym} 
is any set of linearly independent vectors in U, then, unless the y's already 
form a basis, we can find vectors Ym4i, ***, Ym+p 80 that the totality of the 
y's, that is, {Y1, ***, Ym, Ym+1s ***, Ym+p}, isa basis. In other words, every 
linearly independent set can be extended to a basis. 


PROOF. Since VU is finite-dimensional, it has a finite basis, say {x1, ---, 
Zn}. We consider the set $ of vectors 


Yis °°") Ym, Ti, °° t3 Tns 


in this order, and we apply to this set the theorem of § 6 several times in 
succession. In the first place, the set $ is linearly dependent, since the 
y’s are (as are all vectors) linear combinations of the «’s. Hence some 
vector of $ is a linear combination of the preceding ones; let z be the first 
such vector. Then z is different from any y; i = 1, ---, m (since the 
y’s are linearly independent), so that z is equal to some z, say z = 2}. 
We consider the new set $’ of vectors 


Yis °°", Yms Ti; ** t, Tii; Tipis oes Tn. 


We observe that every vector in U is a linear combination of vectors in 
8’, since by means of y1, <->, Ym, Ti, °- *, Zi—ı We may express z;, and 
then by means of Ti, °° +, Ly_1, Ti, Tipt; +++, En We May express any vector. 
(The x’s form a basis.) If $’ is linearly independent, we are done. If 
it is not, we apply the theorem of § 6 again and again the same way till 
we reach a linearly independent set containing yı, ---, Ym, in terms of 


Ha we may express every vector in U. This last set is a basis containing 
e y's. 
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EXERCISES 
1. (a) Prove that the four vectors : 
z= (1,0,0, 
y = (0, 1, 0), 
z = (0, 0, 1), 
u = (1, 1, 1), 


in C? form a linearly dependent set, but any three of them are linearly independent. 
(To test the linear dependence of vectors z = (£1, £a, &), y = (m1, 2, Ma), and 
z = ($1, $2, |3) in ©*, proceed as follows. Assume that a, 8, and y can be found 
so that az + By + yz = 0. This means that 


ağı + m + oi = 0, 
ake + Bye + fe = 0, 
ats + Bms + yf: = 0. 


The vectors z, y, and z are linearly dependent if and only if these equations have a 
solution other than a = 8 = y = 0.) 

(b) If the vectors x, y, z, and u in @ are defined by z(t) = 1, y(i) = t, e() = È, 
and u(t) = 1 +t + #, prove that z, y, z, and u are linearly dependent, but any 
three of them are linearly independent. 


2. Prove that if Q is considered as a rational vector space (see § 3, (8)), then a 
necessary and sufficient condition that the vectors 1 and £ in ® be linearly in- 
dependent is that the real number £ be irrational. 


3. Is it true that if z, y, and z are linearly independent vectors, then so also are 
z+ty,y+z,andz+ 2? 


4. (a) Under what conditions on the scalar £ are the vectors (1 + £, 1 — §) 
and (1 — &, 1 + &) in Œ? linearly dependent? 

(b) Under what conditions on the scalar £ are the vectors (¢, 1, 0), (1, £, 1), 
and (0, 1, £) in ®? linearly dependent? 

(c) What is the answer to (b) for Q? (in place of ®*)? 


5. (a) The vectors (£1, ) and (71, 72) in ©? are linearly dependent if and only if 
m: = m. 

(b) Find a similar necessary and sufficient condition for the linear dependence 
of two vectors in C*, Do the same for three vectors in €’, 

(c) Is there a set of three linearly independent vectors in €?? 


6. (a) Under what conditions on the scalars ¢ and y are the vectors (1, &) and 
(1, 7) in Œ? linearly dependent? 

(b) Under what conditions on the scalars £, 4, and ¢ are the vectors (1, $, £), 
(1, n, n°), and (1, ¢, ¢7) in Œ linearly dependent? 

(c) Guess and prove a generalization of (a) and (b) to €”, 


7. (a) Find two bases in ©‘ such that the only vectors common to both are 
(0, 0, 1, 1) and (1, 1, 0, 0). 
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(b) Find two bases in @* that have no vectors in common so that one of them 
contains the vectors (1, 0, 0, 0) and (1, 1, 0, 0) and the other one contains the 
vectors (1, 1, 1, 0) and (1, 1, 1, 2). 


8. (a) Under what conditions on the scalar £ do the vectors (1, 1, 1) and (1, é, £) 
form a basis of ©*? 

(b) Under what conditions on the scalar £ do the vectors (0, 1, £), ($, 0, 1), and 
(£, 1, 1 + $) form a basis of €*? 


9. Consider the set of all those vectors in C? each of whose coordinates is either 
0 or 1; how many different bases does this set contain? 


10. If X is the set consisting of the six vectors (1, 1, 0, 0), (1, 0, 1, 0), (1, 0, 0, 1), 
(0, 1, 1, 9), (0, 1, 0, 1), (0, 0, 1, 1) in C4, find two different maximal linearly 
independent subsets of X. (A maximal linearly independent subset of X is a linearly 
independent subset Y of X that becomes linearly dependent every time that a vector 
of X that is not already in Y is adjoined to Y.) 


11. Prove that every vector space has a basis. (The proof of this fact is out of 
reach for those not acquainted with some transfinite trickery, such as well~ordering 
or Zorn’s lemma.) 


§ 8. Dimension 


THEOREM 1. The number of elements in any basis of a finite-dimensional 
vector space U ts the same as in any other basis. 


PROOF. The proof of this theorem is a slight refinement of the method 
used in § 6, and, incidentally, it proves something more than the theorem 
states. Let X = {z1, ---, zn} and Y = {yi, ---, ym} be two finite sets 
of vectors, each with one of the two defining properties of a basis; i.e., we 
assume that every vector in Y is a linear combination of the z’s (but not 
that the z's are linearly independent), and we assume that the y’s are 
linearly independent (but not that every vector is a linear combination 
of them). We may apply the theorem of § 6, just as above, to the set $ 
of vectors 


Yms Tis °t, Une 


Again we know that every vector is a linear combination of vectors of 
S and that $ is linearly dependent. Reasoning just as before, we obtain 
a set $’ of vectors 

Um, Tis ***, Zils Titl, °`’, Tn, 


again with the property that every vector is a linear combination of vectors 
of 8’. Now we write ym_, in front of the vectors of 3' and apply the same 
argument. Continuing in this way, we see that the z's will not be exhausted 
before the y’s, since otherwise the remaining y’s would have to be linear 
combinations of the ones already incorporated into 8, whereas we know 
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that the y’s are linearly independent. In other words, after the argument 
has been applied m times, we obtain a set with the'same property the 
z’s had, and this set differs from the set of x’s in that m of them are re- 
placed by y’s. This seemingly innocent statement is what we are after; 
it implies that n = m. Consequently if both X and Y are bases (so that 
they each have both properties), then n = m and m 2 n. 


Derinirion. The dimension of a finite-dimensional vector space VU is 
the number of elements in a basis of U. 


Observe that since the empty set of vectors is a basis of the trivial 
space O, the definition implies that that space has dimension 0. At the 
same time the definition (together with the fact that we have already 
exhibited, in § 7, one particular basis of €”) at last justifies our terminology 
and enables us to announce the pleasant result: n-dimensional coordinate 
space is n-dimensional. (Since the argument is the same for Q” and for 
C”, the assertion is true in both the real case and the complex case.) 

Our next result is a corollary of Theorem 1 (via the theorem of § 7). 


THEOREM 2. Every set of n + 1 vectors in an n-dimensional vector space 
V is linearly dependent. A set of n vectors in U is a basis if and only if it is 
linearly independent, or, alternatively, if and only if every vector in U 
ts a linear combination of elements of the set. 


§ 9. Isomorphism 


As an application of the notion of linear basis, or coordinate system, 
we shall now fulfill an implicit earlier promise by showing that every 
finite-dimensional vector space over a field $ is essentially the same as 
(in technical language, is isomorphic to) some $". 


Derinition. Two vector spaces U and U (over the same field) are 
tsomorphic if there is a one-to-one correspondence between the vectors 
z of U and the vectors y of U, say y = T(x), such that 


Tazı + azt2) = aT (21) + agT (Xo). 


In other words, ù and Y are isomorphic if there is an isomorphism (such 
as T’) between them, where an isomorphism is a one-to-one correspondence 
that preserves all linear relations. 


It is easy to see that isomorphic finite-dimensional vector spaces have 
the same dimension; to each basis in one space there corresponds a basis 
in the other space. Thus dimension is an isomorphism invariant; we shall 
now show that it is the only isomorphism invariant, in the sense that every 
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two vector spaces with the same finite dimension (over the same field, of 
course) are isomorphic. Since the isomorphism of ù and U on the one 
hand, and of U and W on the other hand, implies that u and W are iso- 
morphic, it will be sufficient to prove the following theorem. 


THEOREM. Every n-dimensional vector space V over a field § is isomorphic 
to F”. 


PROOF. Let {21, +++, £n} be any basisin U. Each x in VU can be written 
in the form isı +----+ ntn, and we know that the scalars $, +*+, En 
are uniquely determined by x. We consider the one-to-one correspondence 


ve (&, +++, En) 


between V and $”. y = mti +°: + ata, then 
ax + By = (ak + Bm)zi +++ + (afn + Ban) tn} 


this establishes the desired isomorphism. 

One might be tempted to say that from now on it would be silly to try 
to preserve an appearance of generality by talking of the general n-di- 
mensional vector space, since we know that, from the point of view of 
studying linear problems, isomorphic vector spaces are indistinguishable, 
and, consequently, we might as well always study $". There is one catch. 
The most important properties of vectors and vector spaces are the ones 
that are independent of coordinate systems, or, in other words, the ones 
that are invariant under isomorphisms. The correspondence between 
V and 5” was, however, established by choosing a coordinate system; were 
we always to study $”, we would always be tied down to that particular 
coordinate system, or else we would always be faced with the chore of 
showing that our definitions and theorems are independent of the co- 
ordinate system in which they happen to be stated. (This horrible dilemma 
will become clear later, on the few occasions when we shall be forced to 
use a particular coordinate system to give a definition.) Accordingly, 
in the greater part of this book, we shall ignore the theorem just proved, 
and we shall treat n-dimensional vector spaces as self-respecting entities, 
independently of any basis. Besides the reasons just mentioned, there is 
another reason for doing this: many special examples of vector spaces, 
such for instance as ®,, would lose a lot of their intuitive content if we were 
to transform them into €” and speak of coordinates only. In studying 
vector spaces, such as @,, and their relation to other vector spaces, we 
must be able to handle them with equal ease in different coordinate systems, 
or, and this is essentially the same thing, we must be able to handle them 
without using any coordinate systems at all. 
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EXERCISES 


1. (a) What is the dimension of the set © of all complex numbers considered 
as a real vector space? (See § 3, (9).) 

(b) Every complex vector space Ù is intimately associated with a real vector 
space U-; the space UT is obtained from U by refusing to multiply vectors of U 
by anything other than real scalars. If the dimension of the complex vector space 
Y is n, what is the dimension of the real vector space U~? 


2. Is the set Q of all real numbers a finite-dimensional vector space over the 
field © of all rational numbers? (See § 3, (8). The question is not trivial; it helps 
to know something about cardinal numbers.) 


3. How many vectors are there in an n-dimensional vector space over the field 
Zp (where p is a prime)? 

4, Discuss the following assertion: if two rational vector spaces have the same 
cardinal number (i.e., if there is some one-to-one correspondence between them), 
then they are isomorphic (i.e., there is a linearity-preserving one-to-one correspond- 
ence between them). A knowledge of the basie facts of cardinal arithmetic is 
needed for an intelligent discussion. 


§ 10. Subspaces 


The objects of interest in geometry are not only the points of the space 
under consideration, but also its lines, planes, ete. We proceed to study 
the analogues, in general vector spaces, of these higher-dimensional ele- 
ments. 


Derinirion. A non-empty subset M of a vector space U is a subspace 
or a linear manifold if along with every pair, x and y, of vectors contained 
in M, every linear combination ax + fy is also contained in M. 


A word of warning: along with each vector x, a subspace also contains 
æ — z. Hence if we interpret subspaces as generalized lines and planes, 
we must be careful to consider only lines and planes that pass through the 
origin. 

A subspace M in a vector space VU is itself a vector space; the reader 
can easily verify that, with the same definitions of addition and scalar 
multiplication as we had in U, the set satisfies the axioms (A), (B), and (C) 
of § 2. 

Two special examples of subspaces are: (i) the set © consisting of the 
origin only, and (ii) the whole space U. The following examples are less 
trivial. 

(1) Let n and m be any two strictly positive integers, m Sn. Let M 
be the set of all vectors x = (&, «++, n) in C” for which $1 =--- = m = 0. 
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(2) With m and n as in (1), we consider the space ®,, and any m real 
numbers t, +-+, tm. Let M be the set of all vectors (polynomials) z in 
On for which z(é,) =» -< = zlim) = 0. 

(3) Let m be the set of all vectors x in © for which z(t) = z(—t) holds 
identically in ¢. 

We need some notation and some terminology. For any collection 
{9%} of subsets of a given set (say, for example, for a collection of sub- 
spaces in a vector space U), we write (), M, for the intersection of all 
M,, i.e., for the set of points common to them all. Also, if 5% and % are 
subsets of a set, we write M C S if M is a subset of I, that is, if every ele- 
ment of M lies in N also. (Observe that we do not exclude the possibility 
I = N; thus we write U C V as well as O C U.) For a finite collection 
{Mi -te Mp}, we shall write M: M---M May in place of (1, 2%; in case 
two subspaces M and NÑ are such that M N N = O, we shall say that 
M and K are disjoint. 


§ 11. Calculus of subspaces 


THEOREM 1. The intersection of any collection of subspaces is a subspace. 


proor. If we use an index v to tell apart the members of the collection, 
so that the given subspaces are NG, let us write 


m = N m. 


Since every M, contains 0, so does M, and therefore M is not empty. If 
zx and y belong to M (that is, to all M,), then az + By belongs to all M, 
and therefore M is a subspace. 

To see an application of this theorem, suppose that $ is an arbitrary set 
of vectors (not necessarily a subspace) in a vector space U, There certainly 
exist subspaces I containing every element of $ (that is, such that $ C M); 
the whole space is, for example, such a subspace. Let M be the inter- 
section of all the subspaces containing §$; it is clear that M itself is a sub- 
space containing 8. It is clear, moreover, that m is the smallest such 
subspace; if 8$ is also contained in the subspace 2%, $ C R, then WC WM. 
The subspace M so defined is called the subspace spanned by $ or the span 
of $. The following result establishes the connection between the notion 
of spanning and the concepts studied in §§ 5-9. 


Tarormm 2. Tf $ is any set of vectors in a vector space O and af M is the 
subspace spanned by $, then M is the same as the set of all linear combinations 
of elements of 8. 
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PROOF. It is clear that a linear combination of linear combinations 
of elements of $ may again be written as a linear combination of elements 
of $. Hence the set of all linear combinations of elements of $ is a sub- 
space containing $; it follows that this subspace must also contain M. 
Now turn the argument around: M contains $ and is a subspace; hence M 
contains all linear combinations of elements of $. 

We see therefore that in our new terminology we may define a linear 
basis as a set of linearly independent vectors that spans the whole space. 

Our next result is an easy consequence of Theorem 2; its proof may be 
safely left to the reader. 


THEOREM 3. If 5¢ and K are any two subspaces and if SM is the subspace 
spanned by X and K together, then M is the same as the set of all vectors 
of the form x + y, with x in X and y in K. 


Prompted by this theorem, we shall use the notation 3¢ + X for the 
subspace M spanned by 3e and K. We shall say that a subspace X of 
a vector space U is a complement of a subspace X if X N K = © and 
K+ K = WV. 


§ 12. Dimension of a subspace 


THEOREM 1. A subspace M in an n-dimensional vector space U is a vector 
space of dimension <S n. 


PROOF. It is possible to give a deceptively short proof of this theorem 
that runs as follows. Every set of n + 1 vectors in U is linearly dependent, 
hence the same is true of M; hence, in particular, the number of elements 
in each basis of Mis < n, Q.E.D. 

The trouble with this argument is that we defined dimension n by 
requiring in the first place that there exist a finite basis, and then demanding 
that this basis contain exactly n elements. The proof above shows only 
that no basis can contain more than n elements; it does not show that 
any basis exists. Once the difficulty is observed, however, it is easy to 
fill the gap. If Mm = O, then M is 0-dimensional, and we are done. If M 
contains a non-zero vector 21, let Mı (© M) be the subspace spanned by 
zı. If M = Mı, then M is 1-dimensional, and we are done. If M < Mi, 
let 22 be an element of M not contained in Mı, and let Mz be the sub- 
space spanned by zı and zz; and so on. Now we may legitimately employ 
the argument given above; after no more than n steps of this sort, the 
process reaches an end, since (by § 8, Theorem 2) we cannot find n + 1 
linearly independent vectors. 

The following result is an important consequence of this second and 
correct proof of Theorem 1. 
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THEOREM 2. Given any m-dimensional subspace M in an n-dimensional 
vector space U, we can find a basis {x, +--+, Em, Im4i, ***, Zn} în U so 
that £1, +++, &m are in M and form, therefore, a basis of M. 


We shall denote the dimension of a vector space U by the symbol dim ©. 
In this notation Theorem 1 asserts that if M is a subspace of a finite-di- 
mensional vector space U, then dim M < dim V. 


EXERCISES 


1. If M and XN are finite-dimensional subspaces with the same dimension, and 
if MCR, then M = K. 


2. If WM and N are subspaces of a vector space U, and if every vector in U belongs 
either to W or to N (or both), then either M = V or N = V (or both). 


3. If z, y, and z are vectors such that z + y + z = 0, then x and y span the 
same subspace as y and z. 


4. Suppose that x and y are vectors and SW is a subspace in a vector space U; 
let JC be the subspace spanned by M and z, and let K be the subspace spanned 
by Mandy. Prove that if y is in C but not in M, then z is in X. 


5. Suppose that £, M, and N are subspaces of a vector space. 
(a) Show that the equation 


£EN(M+M=(LN WM+(LNW 


is not necessarily true. 
(b) Prove that 


£ N (M++ (L N W) = (L N W +E NW. 


6. (a) Can it happen that a non-trivial subspace of a vector space U (i.e., a 
subspace different from both © and U) has a unique complement? 

(b) If M is an m-dimensional subspace in an n-dimensional vector space, then 
every complement of J has dimension n — m. 


T (a) Show that if both 9% and N are three-dimensional subspaces of a five- 
dimensional vector space, then Sit and N are not disjoint. 
(b) If M and N are finite-dimensional subspaces of a vector space, then 


dim M + dim N = dim (M + N) + dim (M N 92). 


8. A polynomial z is called even if z(—t) = x(t) identically in t (see § 10, (3), 
and it is called odd if a(—t) = —z(t). 

(a) Both the class WM of even polynomials and the class IU of odd polynomials 
are subspaces of the space © of all (complex) polynomials. 

(b) Prove that M and N are each other’s complements. 
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§ 13. Dual spaces 


Derinirion. A linear functional on a vector space V is a scalar-valued 
function y defined for every vector x, with the property that (identically 
in the vectors z; and zz and the scalars a; and ag) 


y(ayy + ete) = ayy(21) + azy (za). 
Let us look at some examples of linear functionals. 


(1) For z = (fı, +++, En) in ©", write y(x) = &. More generally, let 
a, +++, @n be any n scalars and write 


yz) = aki +--+ anën- 
We observe that for any linear functional y on any vector space 
y0) = y(0-0) = 0-y(0) = 0; 


for this reason a linear functional, as we defined it, is sometimes called 
homogeneous. In particular in C”, if y is defined by 


y(x) = ark; +: + ants + 8, 


then y is not a linear functional unless 8 = 0. 

(2) For any polynomial z in @, write y(x) = z(0). More generally, 
let a, --+, an be any n scalars, let t, ---, tan be any n real numbers, and 
write 

y(z) = ayz(t;) +--+ +anr(tn). 


Another example, in a sense a limiting case of the one just given, is obtained 
as follows. Let (a, b) be any finite interval on the real taxis, and let a 
be any complex-valued integrable function defined on (a, b); define y by 


b 
y(z) = f a(é)x(t) di. 


(3) On an arbitrary vector space 0, define y by writing 


yl) = 0 
for every z in U. 

The last example is the first hint of a general situation. Let U be any 
vector space and let U’ be the collection of all linear functionals on V. 
Let us denote by 0 the linear functional defined in (3) (compare the comment 
at the end of § 4). If y, and ye are linear functionals on UV and if a, and 
ag are scalars, let us write y for the function defined by 


ya) = ayı (2) + azyalz). 
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It is easy to see that y is a linear functional; we denote it by ayy + coy. 
With these definitions of the linear concepts (zero, addition, scalar multi- 
plication), the set U’ forms a vector space, the dual space of U. 


§ 14. Brackets 


Before studying linear functionals and dual spaces in more detail, we 
wish to introduce a notation that may appear weird at first sight but that 
will clarify many situations later on. Usually we denote a linear functional 
by a single letter such as y. Sometimes, however, it is necessary to use 
the function notation fully and to indicate somehow that if y is a linear 
functional on © and if z is a vector in V, then y(z) is a particular scalar. 
According to the notation we propose to adopt here, we shall not write 
y followed by z in parentheses, but, instead, we shall write z and y enclosed 
between square brackets and separated by a comma. Because of the un- 
usual nature of this notation, we shall expend on it some further verbiage. 

As we have just pointed out (z, yj is a substitute for the ordinary func- 
tion symbol y(x); both these symbols denote the scalar we obtain if we 
take the value of the linear function y at the vector z. Let us take an 
analogous situation (concerned with functions that are, however,*not 
linear). Let y be the real function of a real variable defined for each real 
number z by y(z) = z”. The notation [z, y] is a symbolic way of writing 
down the recipe for actual operations performed; it corresponds to the 
sentence [take a number, and square it]. 

Using this notation, we may sum up: to every vector space U we make 
correspond the dual space U’ consisting of all linear functionals on U; 
to every pair, x and y, where z is a vector in U and y is a linear functional 
in V’, we make correspond the scalar [z, y) defined to be the value of y 
at x. In terms of the symbol [z, y] the defining property of a linear func- 
tional is 


(1) forty + œz, y] = alzi, y] + alza, y), 
and the definition of the linear operations for linear functionals is 
(2) [z, ay + azy] = ayfx, yı} + alz, yo). 


The two relations together are expressed by saying that [z, y] is a bilinear 
functional of the vectors z in Y and y in V’. 
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EXERCISES 


1. Consider the set © of complex numbers as a real vector space (as in § 3, (9)). 
Suppose that for each v = & + it in @ (where $, and $; are real numbers and 
í= V —1) the function y is defined by 

(a) y(z) = &, 

(b) y(x) = £z, 

(c) ylz) F £7, 

(d) y(x) = & — tb, 

(e) yx) = WE? + &% (The square root sign attached to a positive number 
always denotes the positive square root of that number.) 

In which of these cases is y a linear functional? 


2. Suppose that for each z = (£1, £2, &) in C? the function y is defined by 
(a) yz) = £1 + £, 

(b) w(x) = $1 — E3, 

(e) ya) = & +1, 

(d) y(x) = £1 — 2f2 + 3s. 

In which of these cases is y a linear functional? 

3. Suppose that for each x in @ the function y is defined by 


(a) ule) =f 200 a 
b) ve) = f° cto)? 
©) ve) =f tat) dt, 
D ufc) = fae) a, 
(©) ue) = F 


o ue) = Fl 


In which of these cases is y a linear functional? 


4. If (a0, a1, a2, +-+) is an arbitrary sequence of complex numbers, and if x is 
an element of @, z(t) = t-o £t, write y(z) = Df o Ea. Prove that y is an 
element of @’ and that every element of ®’ can be obtained in this manner by a 
suitable choice of the a’s. 


5. If y is a non-zero linear functional on a vector space VU, and if æ is an arbitrary 
scalar, does there necessarily exist a vector x in U such that [z, y] = a? 


6. Prove that if y and z are linear functionals (on the same vector space) such 
that [x, y] = 0 whenever [z, z] = 0, then there exists a scalar a such that y = az. 
(Hint: if [zo, 2] = 0, write a = [zo, y]/[zə 2].) 
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§15. Dual bases 


One more word before embarking on the proofs of the important theo- 
rems. The concept of dual space was defined without any reference to 
coordinate systems; a glance at the following proofs will show a super- 
abundance of coordinate systems. We wish to point out that this phenome- 
non is inevitable; we shall be establishing results concerning dimension, 
and dimension is the one concept (so far) whose very definition is given in 
terms of a basis. 


Turorem 1. If UV is an n-dimensional vector space, if {x1, <- +, £n} is a 


basis in U, and if {a1, +++, an} is any set of n scalars, then there is one 
and only one linear functional y on 0 such that [z;, y] = a; for i = 1, 
ewe , M. 


PROOF. Every zin V may be written in the form x = £21 +++++ Et, 
in one and only one way; if y is any linear functional, then 


Iz, y) = &:lt1, y) +--+ Enlin, yl. 


From this relation the uniqueness of y is clear; if {x;, y] = as, then the 
value of [z, y] is determined, for every z, by [z, y] = J: ¢:a;. The argument 
can also be turned around; if we define y by 


[z, y] = ġa tet Enan, 
then y is indeed a linear functional, and {x;, y] = a;. 


TuroreM 2. If U is an n-dimensional vector space and if X = EZP 
+++, Za} is a basis in U, then there is a uniquely determined basis xX’ in 
V, X = (y1, +++, Yn}, with the property that [x;, yj] = ôs. Consequently 
the dual space of an n-dimensional space is n-dimensional. 


The basis X’ is called the dual basis of X. 


PROOF. It follows from Theorem 1 that, for each j = 1, +++, n, a unique 
yj in V’ can be found so that [z;, yj] = 5,3; we have only to prove that the 
set X = fyi, +++, ya} is a basis in V’. 

In the first place, x’ is a linearly independent set, for if we had ey, + 
tE any, = 0, in other words, if 


fx, ayı +--+ + anya) = aile, yl +--+ anle, yal = 0 
for all z, then we should have, for z = 2,, 


0 = Dy afta yl = Dy ahi; = a 
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In the second place, every y in V’ is a linear combination of y1, ---, yn. 
To prove this, write [x;, y] = a;; then, for z = >; ¢:x;, we have 


[z, y] = fa Heee Enan. 
On the other hand 
[z, yl = Dos Eds, ys) = &, 


so that, substituting in the preceding equation, we get 
[z, y] = alz, y1] +--+ + ælt, yn] 
= [z, ayı H't anyal. 


Consequently y = ayy, +--+ an¥n, and the proof of the theorem is 
complete. 
We shall need also the following easy consequence of Theorem 2. 


THEOREM 3. If u and v are any two different vectors of the n-dimensional 
vector space U, then there exists a linear functional y on U such that [u, y] 
æ [v, y]; or, equivalently, to any non-zero vector x in U there corresponds 
a y in V' such that [z, y] ¥ 0. 


PROOF. That the two statements in the theorem are indeed equivalent 
is seen by considering z = u — v. We shall, accordingly, prove the latter 
statement only. 

Let X = {x1, ---, £n} be any basis in U, and let X’ = {y;, ---, Yn} be 
the dual basis in V’. If z = >: tix, then (as above) [z, yj] = £;. Hence 
if [v, y] = 0 for all y, and, in particular, if [z, yj] = 0 for j = 1, ---, n, 
then x = 0, 


§ 16. Reflexivity 


It is natural to think that if the dual space U’ of a vector space U, and 
the relations between a space and its dual, are of any interest at all for 
V, then they are of just as much interest for U’. In other words, we propose 
now to form the dual space (U’)’ of U’; for simplicity of notation we shall 
denote it by V”. The verbal description of an element of U” is clumsy: 
such an element is a linear functional of linear functionals. It is, however, 
at this point that the greatest advantage of the notation [z, y] appears; 
by means of it, it is easy to discuss U and its relation to U”. 

If we consider the symbol [z, y] for some fixed y = yo, we obtain nothing 
new: [z, yo] is merely another way of writing the value yo(x) of the function 
yo at the vector x. If, however, we consider the symbol [z, y] for some 
fixed z = xo, then we observe that the function of the vectors in V’, whose 
value at y is [zo, y], is a scalar-valued function that happens to be linear 
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(see § 14, (2)); in other words, [zo, y] defines a linear functional on v’, 
and, consequently, an element of W”. 

By this method we have exhibited some linear functionals on VU’; have 
we exhibited them all? For the finite-dimensional case the following theo- 
rem furnishes the affirmative answer, 


THEOREM. If V is a finite-dimensional vector space, then corresponding 
to every linear functional zo on U there is a vector zo in © such that zoly) 
= [zo, y] = y(xo) for every y in V'; the correspondence zo I To between 
V” and V is an isomorphism. 


The correspondence described in this statement is called the natural 
correspondence between U” and Ù. 


PROOF. Let us view the correspondence from the standpoint of going 
from V to U”; in other words, to every zo in U we make correspond a 
vector zg in V” defined by zoly) = y(zo) for every y in 0’. Since [z, y] 
depends linearly on z, the transformation % —> 2p is linear. 

We shall show that this transformation is one-to-one, as far as it goes. 
We assert, in other words, that if z; and z are in U, and if 2, and 2z are 
the corresponding vectors in U” (so that z,(y) = (x1, y] and zo(y) = [z2, y] 
for all y in V’), and if zı = ze, then zı = x3. To say that zı = z means 
that [%1, yl = [z2, y] for every y in 0’; the desired conclusion follows from 
§ 15, Theorem 3. 

The last two paragraphs together show that the set of those linear 
functionals z on V’ (that is, elements of U”) that do have the desired form 
(that is, z(y) is identically equal to (x, y] for a suitable z in U) is a subspace 
of U” which is isomorphic to U and which is, therefore, n-dimensional. 
But the n-dimensionality of V implies that of V’, which in turn implies 
that V” is n-dimensional. It follows that U” must coincide with the 
n-dimensional subspace just described, and the proof of the theorem is 
complete. 

It is important to observe that the theorem shows not only that U and 
V” are isomorphic—this much is trivial from the fact that they have the 
same dimension—but that the natural correspondence is an isomorphism. 
This property of vector spaces is called reflexivity; every finite-dimensional 
vector space is reflexive. 

It is frequently convenient to be mildly sloppy about U”: for finite- 
dimensional vector spaces we shall identify U” with U (by the natural 
isomorphism), and we shall say that the element Zo of U” is the same as 
the element zo of U whenever zo(y) = [zo, y] for all yin V’. In this language 
it is very easy to express the relation between a basis X, in U, and the dual 
basis of its dual basis, in U”; the symmetry of the relation [ts ys) = 83; 
shows that x” = &. 
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§ 17. Annihilators 


DEFINITION. The annihilator $? of any subset $ of a vector space 'U 
(S need not be a subspace) is the set of all vectors y in U’ such that 
[x, y] is identically zero for all z in 8. 


Thus 0° = V’ and 0° = 0 (C V’). If 0 is finite-dimensional and $ 
contains a non-zero vector, so that $ ~ 0, then § 15, Theorem 3 shows 
that 8° 4 0’. 


Turorem 1. If m is an m-dimensional subspace of an n-dimensional 
vector space U, then MÌ is an (n — m)-dimensional subspace of v. 


PROOF. We leave it to the reader to verify that M? (in fact 3°, for an 
arbitrary $) is always a subspace; we shall prove only the statement con- 
cerning the dimension of It’. 

Let X = {21, +*+, tn} be a basis in U whose first m elements are in M 
(and form therefore a basis for M); let X = {y1, --°, Yn} be the dual 
basis in V’. We denote by 91 the subspace (in V’) spanned by Ym+1, **', Yni 
clearly 9 has dimension n — m. We shall prove that M =R. 

If z is any vector in M, then z is a linear combination of £1, +--+, Em, 


T= paar Eiti, 
and, for any j = m + 1, -+ +, n, we have 
lz, yl = Pim tiles y; = 0. 


In other words, y; is in mM? for j= m + 1, ---, n; it follows that X is 
contained in MÌ, 
n eE wm. 


Suppose, on the other hand, that y is any element of m’. Since y, being 
in V’, is a linear combination of the basis vectors y1, +++, Yn, we may write 


y= Det niyi. 
Since, by assumption, y is in 92°, we have, for every i = 1, +++, m, 
0 = [es y) = Ege ales, yl = w 


in other words, y is a linear combination of ym41, ***, Yn- This proves 
that y is in N, and consequently that 


WCR, 
and the theorem follows. 
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THEOREM 2. If M is a subspace in a finite-dimensional vector space 0, 
then M (= (90°)°) = M. 


proor. Observe that we use here the convention, established at the 
end of § 16, that identifies U and U”. By definition, 90 is the set of all 
vectors z such that {z, y] = 0 for all y in MÌ. Since, by the definition of 
MÌ, [z, y] = 0 for all x in M and all y in 9°, it follows that m c mM, 
The desired conclusion now follows from a dimension argument. Let 
g be m-dimensional; then the dimension of M? is n — m, and that of M 
isn —(n—m) =m. Hence M = M”, as was to be proved. 


EXERCISES 


1. Define a non-zero linear functional y on C? such that if zı = (1, 1, 1) and 
= (i, 1, ~1), then [zs y] = [z2, yl = 0. 


2. The vectors zı = (1, 1, 1), zz = (1, 1, —1), and zs = (1, —1, —1) form a 
basis of C*, If {y1, ye, ys} is the dual basis, and if z = (0, 1, 0), find [x, y1], [£, yel, 
and [2, yal. 


3. Prove that if y is a linear functional on an n-dimensional vector space U, 
then the set of all those vectors x for which [z, y] = 0 is a subspace of U; what is 
the dimension of that subspace? 


4. If y(x) = £ + & + £: whenever x = (£1, $z, &) is a vector in ©, then y 
is a linear functional on €f; find a basis of the subspace consisting of all those 
vectors x for which [z, y] = 0. 


5. Prove that if m < n, and if y:, ---, ym are linear functionals on an n-di- 
mensional vector space “U, then there exists a non-zero vector z in Ù such that 
(x, y] = 0 for j = 1, ---, m. What does this result say about the solutions of 
linear equations? 


6. Suppose that m <n and that yı, ---, ym are linear functionals on an n- 
dimensional vector space U. Under what conditions on the scalars œi, -+-, am 
is it true that there exists a vector z in U such that [z, yj] = a; for j = 1, ---, m? 
What does this result say about the solutions of linear equations? 


7. If U is an n-dimensional vector space over a finite field, and if 0 S m Sn 
then the number of m-dimensional subspaces of U is the same as the number 
of (n — m)-dimensional subspaces. 


8. (a) Prove that if $ is any subset of a finite-dimensional vector space, then 
S% coincides with the subspace spanned by $. 

(b) If $ and 3 are subsets of a vector space, and if $ C 3, then 3°C $9, 

(c) If M and N are subspaces of a finite-dimensional vector space, then (WM N N)? 
= M + N and (M + N) = M? N HO. (Hint: make repeated use of (b) and of 
$ 17, Theorem 2.) 

(d) m the conclusion of (c) valid for not necessarily finite-dimensional vector 
8Paces 
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9. This exercise is concerned with vector spaces that need not be finite-dimen- 
sional; most of its parts (but not all) depend on the sort of transfinite reasoning 
that is needed to prove that every vector space has a basis (cf. § 7, Ex. 11). 

(a) Suppose that f and g are scalar-valued functions defined on a set X; if æ 
and £ are scalars write h = af + g for the function defined by h(x) = af(z) + 
Bo(x) for all z in X. The set of all such functions is a vector space with respect to 
this definition of the linear operations, and the same is true of the set of all finitely 
non-zero functions. (A function f on X is finitely non-zero if the set of those elements 
x of X for which f(x) + 0 is finite.) 

(b) Every vector space is isomorphic to the set of all finitely non-zero functions 
on some set. 

(c) If U is a vector space with basis X, and if f is a scalar-valued function defined 
on the set X, then there exists a unique linear functional y on U such that [z, y] 
= f(z) for all z in X. 

(d) Use (a), (b), and (c) to conclude that every vector space U is isomorphic to 
a subspace of U’. 

(e) Which vector spaces are isomorphic to their own duals? 

(f) If Y is a linearly independent subset of a vector space U, then there exists 
a basis of U containing Y. (Compare this result with the theorem of § 7.) 

(g) If X is a set and if y is an element of X, write fy for the scalar-valued function 
defined on X by writing f(x) = 1 or 0 according as z = yorr~y. Let Y be the 
set of all functions f, together with the function g defined by g(x) = 1 for all x 
in X. Prove that if X is infinite, then Y is a linearly independent subset of the 
vector space of all scalar-valued functions on X. 

(h) The natural correspondence from VU to U” is defined for all vector spaces 
(not only for the finite-dimensional ones); if xo is in U, define the corresponding 
element zo of U” by writing zo(y) = [zo, y] for all yin U’. Prove that if U is reflexive 
(i.e., if every zo in U” can be obtained in this manner by a suitable choice of zo), 
then U is finite-dimensional. (Hint: represent U’ as the set of all scalar-valued 
functions on some set, and then use (g), (f), and (c) to construct an element of 0” 
that is not induced by an element of U.) 

Warning: the assertion that a vector space is reflexive if and only if it is finite- 
dimensional would shock most of the experts in the subject. The reason is that 
the customary and fruitful generalization of the concept of reflexivity to infinite- 
dimensional spaces is not the simple-minded one given in (h). 


§18. Direct sums 


We shall study several important general methods of making new vector 
spaces out of old ones; in this section we begin by studying the easiest one. 


Dertnition. If ù and V are vector spaces (over the same field), their 
direct sum is the vector space W (denoted by u @ V) whose elements 
are all the ordered pairs (z, y) with z in U and y in V, with the linear 
operations defined by 


a1 (21, Y1) + az(t2, Y2) = (art + a222, ays + a2Y2). 


We observe that the formation of the direct sum is analogous to the way 
in which the plane is constructed from its two coordinate axes. 
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We proceed to investigate the relation of this notion to some of our 
earlier ones. 

The set of all vectors (in W) of the form (z, 0) is a subspace of W; the 
correspondence (x, 0) = x shows that this subspace is isomorphic to u. 
It is convenient, once more, to indulge in a logical inaccuracy and, identify- 
ing x and (z, 0), to speak of u as a subspace of W. Similarly, of course, 
the vectors y of U may be identified with the vectors of the form (0, y) 
in W, and we may consider U as a subspace of W. This terminology 
is, to be sure, not quite exact, but the logical difficulty is much easier to 
get around here than it was in the case of the second dual space. We could 
have defined the direct sum of U and V (at least in the case in which u 
and U have no non-zero vectors in common) as the set consisting of all 
z’s in U, all y’s in U, and all those pairs (x, y} for which z + 0 and y # 0. 
This definition yields a theory analogous in every detail to the one we 
shall develop, but it makes it a nuisance to prove theorems because of the 
case distinctions it necessitates. It is clear, however, that from the point 
of view of this definition U is actually a subset of U@V. In this sense 
then, or in the isomorphism sense of the definition we did adopt, we raise 
the question: what is the relation between U and U when we consider these 
spaces as subspaces of the big space W? 


THEOREM. If U and V are subspaces of a vector space W, then the following 
three conditions are equivalent. 

(1) wW=u 9V. 

(2) u N V =0 and u +V =W (ie. U and V are complements of 
each other). 

(3) Every vector z in W may be written in the form z = z + y, with 
x in u and y in U, in one and only one way. 


PROOF. We shall prove the implications (1) = (2) = (3) = (1). 

(1) = (2). We assume that W =u V. If z = (z, y) lies in both 

u and V, then z = y = 0, so that z = 0; this proves that u N V = 0. 
Since the representation z = (x, 0) + (0, y) is valid for every z, it follows 
also that U + V = W. 
, (2) = (3). If we assume (2), so that, in particular, U + U = W, then 
it is clear that every z in W has the desired representation, z = £ + y. 
To prove uniqueness, we assume that z = 2, + yı and z = z3 + Y2, with 
zı and zz in U and yı and yz in U. Since zı + yı = z2 + Ye, it follows 
that Tı — T2 = yg — Yı. Since the left member of this last equation is 
in U and the right member is in U, the disjointness of U and U implies 
that zı = z3 and yı = y2. 

(3) = (1). This implication is practically indistinguishable from the 
definition of direct sum. If we form the direct sum U @ U, and then 
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identify (x, 0) and (0, y) with z and y respectively, we are committed to 
identifying the sum (z, y) = (z, 0) + (0, y} with what we are assuming 
to be the general element z = z + y of W; from the hypothesis that the 
representation of z in the form x + y is unique we conclude that the cor- 
respondence between (x, 0) and x (and also between (0, y) and y) is one-to- 
one. 

If two subspaces U and in a vector space W are disjoint and span 
w (that is, if they satisfy (2)), it is usual to say that W is the internal 
direct sum of ù and U; symbolically, as before, W = u @ V. If we want 
to emphasize the distinction between this concept and the one defined 
before, we describe the earlier one by saying that ‘W is the external direct 
sum of ù and U. In view of the natural isomorphisms discussed above, 
and, especially, in view of the preceding theorem, the distinction is more 
pedantic than conceptual. In accordance with our identification conven- 
tion, we shall usually ignore it. 


§ 19. Dimension of a direct sum 


What can be said about the dimension of a direct sum? If u is n-di- 
mensional, U is m-dimensional, and W = U © V, what is the dimension 
of W? This question is easy to answer. 


THEOREM 1. The dimension of a direct sum is the sum of the dimensions 
of its summands. 


proor. We assert that if {x1, ---, Za} is a basis in U, and if {y1, ---, Ym} 
is a basis in U, then the set {x1, «++, £n, Y1, °**, Ym} (or, more precisely, 
the set {(1, 0), +++, (en, 0), (0, y1), -+°, (0, ym)}) is a basis in W. The 
easiest proof of this assertion is to use the implication (1) = (8) from 
the theorem of the preceding section. Since every z in W may be written 
in the form z = z + y, where z is a linear combination of 21, ---, £n and 
y is a linear combination of y1, ---, Ym, it follows that our set does indeed 
span W. To show that the set is also linearly independent, suppose that 


arti tee anita + Biyi +++ ++ BmYm = O. 

The uniqueness of the representation of 0 in the form z + y implies that 
arti +++ ante = Biyt H'et BmYm = 9, 

and hence the linear independence of the z’s and of the y’s implies that 


ay =---m a, = ĝı = -= By = 0. 
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THEOREM 2. If W is any (n + m)-dimensional vector space, and if u 
is any n-dimensional subspace of W, then there exists an m-dimensional 
subspace U in W such that W = U @ U. 


PROOF, Let {x1, -+-, tn} be any basis in U; by the theorem of § 7 we 
may find a set {y1, -+-, Ym} of vectors in W with the property that {x, 
+++, Zn; Yi ***, Ym} is a basis in W. Let U be the subspace spanned by 
Y1» °**y Ym; We omit the verification that W = u @ V. 

Theorem 2 says that every subspace of a finite-dimensional vector space 
has a complement. 


§ 20. Dual of a direct sum 


In most of what follows we shall view the notion of direct sum as defined 
for subspaces of a vector space U; this avoids the fuss with the identification 
convention of § 18, and it turns out, incidentally, to be the more useful 
concept for our later work. We conclude, for the present, our study of 
direct sums, by observing the simple relation connecting dual spaces, 
annihilators, and direct sums. To emphasize our present view of direct 
summation, we return to the letters of our earlier notation. 


THEOREM. If M and N are subspaces of a vector space U, and if U = MN 
® R, then M is isomorphic to N° and N to MÌ, and V' = MÌ @ W. 


proor. To simplify the notation we shall use, throughout this proof, 
zx, x’, and x° for elements of M, M’, and 90°, respectively, and we reserve, 
similarly, the letters y for 9% and z for U. (This notation is not meant to 
suggest that there is any particular relation between, say, the vectors 
x in M and the vectors x’ in W.) 

If z’ belongs to both M? and NP, i.e., if 2’(z) = 2’(y) = 0 for all x and 
y, then 2’(z) = z'(x + y) = 0 for all z; this implies that M? and N? are 
disjoint. If, moreover, z’ is any vector in U’, and if z = x + y, we write 
2°(z) = 2'(y) and yz) = z/(z). It is easy to see that the functions 2° 
and y° thus defined are linear functionals on U (i.e., elements of VU’) belong- 
ing to M? and N? respectively; since z’ = x? + y°, it follows that V’ is 
indeed the direct sum of M? and N. 

To establish the asserted isomorphisms, we make correspond to every 
x° ay’ in W defined by y’(y) = 2°(y). We leave to the reader the routine 
Verification that the correspondence x° — y’ is linear and one-to-one, 
and therefore an isomorphism between M? and N’; the corresponding 
result for N? and M follows from symmetry by interchanging z and y. 
(Observe that for finite-dimensional vector spaces the mere existence of 
an isomorphism between, say, M? and N is trivial from a dimension argu- 


32 SPACES Sec. 20 


ment; indeed, the dimensions of both M? and I are equal to the dimension 
of N.) 

We remark, concerning our entire presentation of the theory of direct 
sums, that there is nothing magic about the number two; we could have 
defined the direct sum of any finite number of vector spaces, and we could 
have proved the obvious analogues of all the theorems of the last three 
sections, with only the notation becoming more complicated. We serve 
warning that we shall use this remark later and treat the theorems it implies 
as if we had proved them. 


EXERCISES 


1. Suppose that z, y, u, and v are vectors in C1; let M and N be the subspaces of 
Ci spanned by {z, y} and {u, v} respectively. In which of the following cases is it 
true that C4 = MAN? 


(a) z = (1, 1,0, 0), y= (1, 0, 1, 0) 
u = (0, 1, , ) v = (0,0, 1, 1). 
(b) z = (—1, 1, 1, 0), y = (0, 1, —1, 1) 
u = (1,0,0,0), v = (0, 0,0, 1). 
(c) z = (1, 0, 0, 1), y= (0,1,1,0) 
u = (1,0,1,0), v= (0, 1, 0, 1). 
2. IfM is the subspace consisting of all those vectors ($n +*+, En, Entr -**) 
fon) in ©?" for which $1 = --- = & = 0, and if N is the subspace of all those 


vectors for which $; = &.43,j = 1, ---, n, then C™ = MOR. 


3. Construct three subspaces M, 911, and Nz of a vector space Ù so that MO I 
= INQ Ne = V but Mı * Ne. (Note that this means that there is no cancellation 
law for direct sums.) What is the geometric picture corresponding to this situation? 


4. (a) If ù, U, and W are vector spaces, what is the relation between U@(V 
DW) and (U@V)@W (iie., in what sense is the formation of direct sums an 
associative operation)? 

(b) In what sense is the formation of direct sums commutative? 


5. (a) Three subspaces £, IN, and N of a vector space VU are called independent 
if each one is disjoint from the sum of the other two. Prove that a necessary and 
sufficient condition for U = £@ (MA N) (and also for V = (LH M) @ W) is that 
£L, M, and N be independent and that U = £ + M +N. (The subspace € + M 
+ N is the set of all vectors of the form z + y + 2, with z in £, y in W, and 
zin N.) 

(b) Give an example of three subspaces of a vector space U, such that the sum 
of all three is U, such that every two of the three are disjoint, but such that the 
three are not independent. 

(c) Suppose that x, y, and z are elements of a vector space and that £, M, and 
J are the subspaces spanned by 2, y, and z, respectively. Prove that the vectors 
z, y, and z are linearly independent if and only if the subspaces £, M, and N are 
independent. 

(d) Prove that three finite-dimensional subspaces are independent if and only 
if the sum of their dimensions is equal to the dimension of their sum. 

(e) Generalize the results (a)-(d) from three subspaces to any finite number. 
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§ 21. Quotient spaces 


We know already that if m is a subspace of a vector space U, then there 
are, usually, many other subspaces M in V such that M ® N = V. There 
is no natural way of choosing one from among the wealth of complements 
of m. There is, however, a natural construction that associates with M 
and U a new vector space that, for all practical purposes, plays the role of 
a complement of m. The theoretical advantage that the construction has 
over the formation of an arbitrary complement is precisely its “natural” 
character, i.e., the fact that it does not depend on choosing a basis, or, for 
that matter, on choosing anything at all. 

In order to understand the construction it is a good idea to keep a picture 
in mind. Suppose, for instance, that U = @? (the real coordinate plane) 
and that m consists of all those vectors (tı, $2) for which £> = 0 (the hori- 
zontal axis). Each complement of SW is a line (other than the horizontal 
axis) through the origin. Observe that each such complement has the 
property that it intersects every horizontal line in exactly one point. The 
idea of the construction we shall describe is to make a vector space out of 
the set of all horizontal lines. 

We begin by using M to single out certain subsets of U. (We are back 
in the general case now.) If z is an arbitrary vector in U, we write © + M 
for the set of all sums x + y with y in M; each set of the form x + M is 
called a coset of m. (In the case of the plane-line example above, the co- 
sets are the horizontal lines.) Note that one and the same coset can arise 
from two different vectors, i.e., that even if z = y, it is possible that 
stom=y-+or. It makes good sense, just the same, to speak of a 
coset, say JC, of mM, without specifying which element (or elements) 3€ 
comes from; to say that JC is a coset (of M) means simply that there is at 
least one x such that X = x + M. 

If X and X are cosets (of M), we write I€ + X for the set of all sums 
u + v with u in K and v in K; we assert that 3€ + K is also a coset of M. 
Indeed, if 0 = z + M and K = y + M, then every element of 5¢ + K 
belongs to the coset (x + y) +M (note that M + M = M), and, con- 
versely, every element of (x + y) + M isin 3 + X. (If, for instance, z 
isin M, then (x + y) +z = (z +2) + (y + 0).) In other words, £ + K 
= (x + y) + M, so that I + XK is a coset, as asserted. We leave to the 
reader the verification that coset addition is commutative and associative. 
The coset M (i.e., 0 + M) is such that 3C + M = Ie for every coset #, 
and, moreover, M is the only coset with this property. (If (x + W) 
+ (y + M) = z + M, then x + M contains x + y, so thatt +y =x +u 
for some u in M; this implies that y is in M, and hence that y + m = M.) 
If 3¢ is a coset, then the set consisting of all the vectors —u, with u in Je, 
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is itself a coset, which we shall denote by —3e. The coset —JC is such 
that 3¢ + (—3¢) = M, and, moreover, —3¢ is the only coset with this 
property. To sum up: the addition of cosets satisfies the axioms (A) of 
§ 2. 

If 3¢ is a coset and if a is a scalar, we write a3¢ for the set consisting of 
all the vectors ou with u in J€ in case a # 0; the coset 0-3¢ is defined to be 
m. A simple verification shows that this concept of multiplication satisfies 
the axioms (B) and (C) of § 2. 

The set of all cosets has thus been proved to be a vector space with respect 
to the linear operations defined above. This vector space is called the 
quotient space of U modulo M; it is denoted by V/M. 


§ 22. Dimension of a quotient space 


TurorEM 1. If mM and R are complementary subspaces of a vector space 
V, then the correspondence that assigns to each vector y in R the coset y + M 
ts an isomorphism between N and V/M. 


PROOF. If yı and yz are elements of N such that yı + M = ye + M, 
then, in particular, yı belongs to y2 + M, so that yı = ye + x for some 
x in M. Since this means that yı — y2 = z, and since M and N are dis- 
joint, it follows that x = 0, and hence that yı = y2. (Recall that yı — y2 
belongs to 9 along with yı and y2.) This argument proves that the cor- 
respondence we are studying is one-to-one, as far as it goes. To prove that 
it goes far enough, consider an arbitrary coset of M, say z + M. Since 
V = N + M, we may write z in the form y + 2, with z in M and y in N; 
it follows (since x + M = M) that z + M =y +M. This proves that 
every coset of M can be obtained by using an element of N (and not just 
any old element of U); consequently y — y + M is indeed a one-to-one 
correspondence between N and U/w. The linear property of the cor- 
respondence is immediate from the definition of the linear operations in 
V/M; indeed, we have 


(ayy, + aaye) + M = ay(yr + M) + azlyz + M). 


Turorem 2. If m is an m-dimensional subspace of an n-dimensional 
vector space U, then U/M has dimension n — m. 


PROOF. Use § 19, Theorem 2 to find a subspace N so that M © N = V. 
The space X has dimension n — m (by § 19, Theorem 1), and it is isomor- 
phic to U/M (by Theorem 1 above). 

There are more topics in the theory of quotient spaces that we could 
discuss (such as their relation to dual spaces and annihilators). Since, 
however, most such topics are hardly more than exercises, involving the 
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use of techniques already at our disposal, we turn instead to some new and 
non-obvious ways of manufacturing useful vector spaces. 


EXERCISES 


1. Consider the quotient spaces obtained by reducing the space @ of polynomials 
modulo various subspaces. If M = Pa, is C/M finite-dimensional?. What if M 
is the subspace consisting of all even polynomials? What if W is the subspace 
consisting of all polynomials divisible by x, (where z(t) = t")? 


2. If $ and 3 are arbitrary subsets of a vector space (not necessarily cosets of a 
subspace), there is nothing to stop us from defining 8 + 3 just as addition was 
defined for cosets, and, similarly, we may define aS (where æ is a scalar). If the 
class of all subsets of a vector space is endowed with these “linear operations,” 
which of the axioms of a vector space are satisfied? 


3. (a) Suppose that M is a subspace of a vector space U. Two vectors z and y 
of U are congruent modulo SN, in symbols z = y (M), if z — y is in M. Prove that 
congruence modulo M is an equivalence relation, i.e. that it is reflexive (x = 2), 
symmetric (if z = y, then y = q), and transitive (if x = y and y = z, then z = 2). 

(b) If a: and az are scalars, and if xı, x2, ys, and yz are vectors such that m1 = yı 
(M) and z2 = yz (M), then azı + are = œyı + æy (M). 

(e) Congruence modulo 31 splits U into equivalence classes, i.e., into sets such 
that two vectors belong to the same set if and only if they are congruent. Prove 
that a subset of 'U is an equivalence class modulo WM if and only if it is a coset of WM. 


4. (a) Suppose that W is a subspace of a vector space U. Corresponding to 
every linear functional y on U/M (i.e., to every element y of (U/M)’), there is a 
linear functional z on VU (i.e., an element of U’); the linear functional z is defined 
by 2(z) = y(x + W). Prove that the correspondence y — z is an isomorphism 
between (0/91)! and M. F 

(b) Suppose that M is a subspace of a vector space U. Corresponding to every 
coset y + M? of M? in V’ (i.e. to every element 3€ of U’/M?), there is a linear 
functional z on M (i.e., an element z of WM’); the linear functional z is defined by 
z(x) = y(x). Prove that z is unambiguously determined by the coset JC (that is, 
it does not depend on the particular choice of y), and that the correspondence 
3¢ — zis an isomorphism between U’/M? and WM’. 


5. Given a finite-dimensional vector space U, form the direct sum W = UV’, 
and prove that the correspondence (x, y} —> (y, z} is an isomorphism between 
W and W’, 


§ 23. Bilinear forms 


If ù and © are vector spaces (over the same field), then their direct sum 
W = u @ VU is another vector space; we propose to study certain functions 
on W., (For present purposes the original definition of ù ® VU, via ordered 
pairs, is the convenient one.) The value of such a function, say w, at an 
element (z, y) of W will be denoted by w(x, y). The study of linear func- 
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tions on W is no longer of much interest to us; the principal facts con- 
cerning them were discussed in § 20. The functions we want to consider 
now are the bilinear ones; they are, by definition, the scalar-valued func- 
tions on W with the property that for each fixed value of either argument 
they depend linearly on the other argument. More precisely, a scalar- 
valued function w on W is a bilinear form (or bilinear functional) if 


war, + ate, yY) = aw(zı, yY) + azw(z2, y) 
and 
w(x, ayı + azy2) = arw(x, y1) + agw(z, y2), 


identically in the vectors and scalars involved. 

In one special situation we have already encountered bilinear functionals. 
If, namely, © is the dual space of U, U = w’, and if we write w(x, y) = [z, y] 
(see § 14), then w is a bilinear functional on U @ u’. For an example in 
a more general situation, let ù and © be arbitrary vector spaces (over the 
same field, as always), let u and v be elements of u’ and V’ respectively, 
and write w(x, y) = u(x)v(y) for all z in U and y in U. An even more 
general example is obtained by selecting a finite number of elements in 
W, say Ur ***, Ue, selecting the same finite number of elements in VU’, 
say vi, °°, UE, and writing w(z, y) = u (x)vi(y) +--+ urle)vely). Which 
of the words, “functional” or “form,” is used depends somewhat on the 
context and, somewhat more, on the user’s whim. In this book we shall 
generally use “functional” with “linear” and “form” with “bilinear” (and 
its higher-dimensional generalizations). 

If w, and w are bilinear forms on W, and if a, and ag are scalars, we 
write w for the function on W defined by 


w(z, y) = œw (z, y) + CLOACA y). 


It is easy to see that w is a bilinear form; we denote it by ajwı + azwz. 
With this definition of the linear operations, the set of all bilinear forms 
on W is a vector space. The chief purpose of the remainder of this section 
is to determine (in the finite-dimensional case) how the dimension of this 
space depends on the dimensions of U and 0. 


TarorEM 1. If UW is an n-dimensional vector space with basis {x1, ++, Zn}, 
if U is an m-dimensional vector space with basis {y1, -+-, Ym}, and of 
{aiz} is any set of nm scalars (i = 1, ---, n; j = 1, ++, m), then there is 
one and only one bilinear form w on u @V such that w(x;, yj) = ai; for 
all i and j. 


proor. Ifs = >>; fits, y = > 27 ny and w is a bilinear form on U @ U 
such that w(x;, y;) = Qij, then 


wlz, y) = Dos Dos Enso les, y) = Doe Dos EMi- 
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From this equation the uniqueness of w is clear; the existence of a suitable 
w is proved by reading the same equation from right to left, that is, de- 
fining w by it. (Compare this result with § 15, Theorem 1.) 


Taxorem 2. If Wis an n-dimensional vector space with basis {x1, ++, tn}, 
and if U is an m-dimensional vector space with basis {y1, --+, Ym}, then 
there is a uniquely determined basis {wpa} (p = 1, e, ns q@ = 1, ++, m) 
in the vector space of all bilinear forms on U ®@ V with the property that 
Wpq(Xi, Lj) = Sipdjq. Consequently the dimension of the space of bilinear 
forms on U @ V is the product of the dimensions of U and V. 


PROOF. Using Theorem 1, we determine Wp (for each fixed p and q) 
by the given condition wpy¢(z;, yj) = Sipdjg. The bilinear forms so de- 
termined are linearly independent, since 


2 Èu Apapa = 0 
0= Èo da Apgdipdjg = Qij. 


If, moreover, w is an arbitrary element of W, and if w(z;, yj) = aij, then 
w = Dp dog palpo. Indeed, if x = D>; gx; and y = Do; ny, then 


Wpg(t, Y) = È: Da, EinjOipdiqg = Epa, 
and, consequently, 
wlz, y) = D: Di Enjoi = Le ‘ay ApqWpq(Z, y). 


It follows that the wp, form a basis in the space of bilinear forms; this 
completes the proof of the theorem. (Compare this result with § 15, 
Theorem 2.) 


implies that 


EXERCISES 


1. (a) If wis a bilinear form on R*@ Q”, then there exist scalars a,;, 1,7 = 1,-°°, 
n, such that if s = (f:, +-+, En) and y = (m, +*+, Na), then w(z, y) = Dos Dos auka 
The scalars a; are uniquely determined by w. 

(b) If z is a linear functional on the space of all bilinear forms on R*@ R*, then 
there exist scalars 8,; such that (in the notation of (a)) ew) = X; >; dubi for 
every w. The scalars 6,;; are uniquely determined by z. 


2. A bilinear form w on U@V is degenerate if, as a function of one of its two 
arguments, it vanishes identically for some non-zero value of its other argument; 
otherwise it is non-degenerate. 
en} ae an example of a degenerate bilinear form (not identically zero) on 


(b) Give an example of a non-degenerate bilinear form on €? @ C?. 
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3. If w is a bilinear form on U@® V, if yo is in U, and if a function y is defined on 
U by y(x) = w(z, yo), then y is a linear functional on U. Is it true that if w is non- 
degenerate, then every linear functional on U can be obtained this way (by a suitable 
choice of yo)? 


4. Suppose that for each x and y in @, the function w is defined by 


1 
(a) wlz, 4) = Í, a(t)y(t) di, 


(b) w(x, y) = 2(1) + (1), 
(c) w(x, y) = 2(1)-y(1), 


(a) w, y) = 2c) (4 7 


In which of these cases is w a bilinear form on Pa @ Px? In which cases is it non- 
degenerate? 


5. Does there exist a vector space U and a bilinear form w on U.@V such that 
w is not identically zero but w(z, z) = 0 for every z in U? 


6. (a) A bilinear form w on U @® U is symmetric if w(x, y) = w(y, x) for all z and y. 
A quadratic form on V is a function q on Ù obtained from a bilinear form w by writing 
g(x) = w(z, x). Prove that if the characteristic of the underlying scalar field is 
different from 2, then every symmetric bilinear form is uniquely determined by 
the corresponding quadratic form. What happens if the characteristic is 2? 

(b) Can a non-symmetric bilinear form define the same quadratic form as a 
symmetric one? 


§ 24. Tensor products 


In this section we shall describe a new method of putting two vector 
spaces together to make a third, namely, the formation of their tensor 
product. Although we shall have relatively little occasion to make use of 
tensor products in this book, their theory is closely allied to some of the 
subjects we shall treat, and it is useful in other related parts of mathe- 
matics, such as the theory of group representations and the tensor calculus. 
The notion is essentially more complicated than that of direct sum; we 
shall therefore begin by giving some examples of what a tensor product 
should be, and the study of these examples will guide us in laying down the 
definition. 

Let u be the set of all polynomials in one variable s, with, say, complex 
coefficients; let U be the set of all polynomials in another variable t; and, 
finally, let W be the set of all polynomials in the two variables s and t. 
With respect to the obvious definitions of the linear operations, U, U, and 
W are all complex vector spaces; in this case we should like to call W, or 
something like it, the tensor product of U and U. One reason for this 
terminology is that if we take any z in U and any y in U, we may form 
their product, that is, the element z of W defined by z(s, ) = x(s)y(2). 
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(This is the ordinary product of two polynomials. Here, as before, we are 
doggedly ignoring the irrelevant fact that we may even multiply together 
two elements of U, that is, that the product of two polynomials in the same 
variable is another polynomial in that variable. Vector spaces in which a 
decent concept of multiplication is defined are called algebras, and their 
study, as such, lies outside the scope of this book.) 

In the preceding example we considered vector spaces whose elements 
are functions. We may, if we wish, consider the simple vector space C” as 
a collection of functions also; the domain of definition of the functions is, 
in this case, a set consisting of exactly n points, say the first n (strictly) 
positive integers. In other words, a vector ($1, ---, Ën) may be considered 
as a function ¢ whose value £(z) is defined for i = 1, ---, n; the definition 
of the vector operations in C” is such that they correspond, in the new no- 
tation, to the ordinary operations performed on the functions ¢. If, simul- 
taneously, we consider @” as the collection of functions n whose value n (j) 
is defined for j = 1, ---, m, then we should like the tensor product of ©” 
and €” to be the set of all functions ¢ whose value ¢(z, j) is defined for 
i=1,---,nandj=1,---,m. The tensor product, in other words, is 
the collection of all functions defined on a set consisting of exactly nm ob- 
jects, and therefore naturally isomorphic to C’”. This example brings out 
a property of tensor products—namely, the multiplicativity of dimension 
—that we should like to retain in the general case. 

Let us now try to abstract the most important properties of these exam- 
ples. The definition of direct sum was one possible rigorization of the crude 
intuitive idea of writing down, formally, the sum of two vectors belonging 
to different vector spaces. Similarly, our examples suggest that the tensor 
product U @ V of two vector spaces U and Y should be such that to every 
x in U and y in U there corresponds a “product” z = z @ y in U @ DV, in 
such a way that the correspondence between x and z, for each fixed y, as 
well as the correspondence between y and z, for each fixed x, is linear. 
(This means, of course, that (a,%1 + a2t2) Q y should be equal to 
a1 (2; @ y) + ae(x2 @ y), and that a similar equation should hold for 
z @ (ayy, + agye).) To put it more simply, x Q y should define a bilinear 
(vector-valued) function of x and y. 

The notion of formal multiplication suggests also that if u and v are 
linear functionals on % and U respectively, then it is their product w, de- 
fined by w(x, y) = u(z)v(y), that should be in some sense the general ele- 
ment of the dual space (u @ UV’. Observe that this product is a bilinear 
(scalar-valued) function of z and y. 
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§ 25. Product bases 


After one more word of preliminary explanation we shall be ready to 
discuss the formal definition of tensor products. It turns out to be tech- 
nically preferable to get at u @ V indirectly, by defining it as the dual of 
another space; we shall make tacit use of reflexivity to obtain u ® Y it- 
self. Since we have proved reflexivity for finite-dimensional spaces only, 
we shall restrict the definition to such spaces. 


Derinition. The tensor product U ® V of two finite-dimensional vector 
spaces Uù and U (over the same field) is the dual of the vector space of 
all bilinear forms on U @ U. For each pair of vectors x and y, with x in 
u and y in U, the tensor product z = x @ y of x and y is the element of 
u Q V defined by z(w) = w(x, y) for every bilinear form w. 


This definition is one of the quickest rigorous approaches to the theory, 
but it does lead to some unpleasant technical complications later. What- 
ever its disadvantages, however, we observe that it obviously has the two 
desired properties: it is clear, namely, that dimension is multiplicative (see 
§ 23, Theorem 2, and § 15, Theorem 2), and it is clear that x ® y depends 
linearly on each of its factors. 

Another possible (and deservedly popular) definition of tensor product 
is by formal products. According to that definition U @ © is obtained by 
considering all symbols of the form >>; a;(z; ® y;), and, within the set of 
such symbols, making the identifications demanded by the linearity of the 
vector operations and the bilinearity of tensor multiplication. (For the 
purist: in this definition z @ y stands merely for the ordered pair of x and 
y; the multiplication sign is just a reminder of what to expect.) Neither 
definition is simple; we adopted the one we gave because it seemed more in 
keeping with the spirit of the rest of the book. The main disadvantage of 
our definition is that it does not readily extend to the most useful generali- 
zations of finite-dimensional vector spaces, that is, to modules and to in- 
finite-dimensional spaces. 

For the present we prove only one theorem about tensor products. The 
theorem is a further justification of the product terminology, and, inciden- 
tally, it is a sharpening of the assertion that dimension is multiplicative. 


THEOREM. If X = {xy, «++, tn} and Y = {y1, ++, Ym} are bases in U 

and © respectively, then the set Z of vectors zij = xz; Q yj @=1, 4, n; 

j = 1, ---, m) is a basis in U Q U. 

PROOF. Let wp, be the bilinear form on U @WV such that WpalTs Yi) 
= ôipô;g (i, p = 1, +++, n; j, q = 1, +++, m); the existence of such bilinear 
forms, and the fact that they constitute a basis for all bilinear forms, follow 
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from § 23, Theorem 2. Let {w’,,} be the dual basis in U & V, so that 
[Wi W'pql = Sipdjg. If w = Yop 2 a apapa i8 an arbitrary bilinear form 
on u @®V, then 


wlw) = [w, w] = oD PE AplWpg, Wij) 
= aij = w(t; Yj) = 25j(w). 


The conclusion follows from the fact that the vectors w’;; do constitute a 
basis of U © V. 


EXERCISES 


1. If x = (1, 1) and y = (1, 1, 1) are vectors in R? and Qê respectively, find the 
coordinates of x Q y in R? Q R? with respect to the product basis {2; O y;}, 
where z; = (òn, 42) and yj = (613, 52, 533). 


2. Let Pam be the space of all polynomials z with complex coefficients, in two 
variables s and t, such that either z = 0 or else the degree of z(s, t) is £ m—1 
for each fixed s and Sn — 1 for each fixed t. Prove that there exists an iso- 
morphism between ©, ® Cm and Ọn,m such that the element z of On. that cor- 
responds to z ® y (z in Pn, y in Pm) is given by 2(s, t) = 2(s)y(d). 


3. To what extent is the formation of tensor products commutative and associa- 
tive? What about the distributive law U @ (UD W) = (U ® VOU Q Ww)? 


4. If U is a finite-dimensional vector space, and if z and y are in U, is it true 
that z @ y =y @ 2? 


5. (a) Suppose that U is a finite-dimensional real vector space, and let U be 
the set © of all complex numbers regarded as a (two-dimensional) real vector 
space. Form the tensor product Ut =U @ U. Prove that there is a way of 
defining products of complex numbers with elements of Ut so that a(x Q y) 
= ax Q y whenever a and z are in © and y is in U. 

(b) Prove that with respect to vector addition, and with respect to complex 
scalar multiplication as defined in (a), the space U* is a complex vector space. 

(c) Find the dimension of the complex vector space U+ in terms of the di- 
mension of the real vector space U. 

(d) Prove that the vector space U is isomorphic to a subspace in Ut (when the 
latter is regarded as a real vector space). 

The moral of this exercise is that not only can every complex vector space be 
regarded as a real vector space, but, in a certain sense, the converse is true. The 
vector space Ut is called the complerification of U. 


dore and U are finite-dimensional vector spaces, what is the dual space of 
Y’ 
§ 26. Permutations 


The main subject of this book is usually known as linear algebra. In the 
last three sections, however, the emphasis was on something called multi- 
linear algebra. It is hard to say exactly where the dividing line is between 
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the two subjects. Since, in any case, both are quite extensive, it would not 
be practical to try to stuff a detailed treatment of both into the same vol- 
ume. Nor is it desirable to discuss linear algebra in its absolutely pure 
state; the addition of even a small part of the multilinear theory (such as 
is involved in the modern view of tensor products and determinants) ex- 
tends the domain of applicability of the linear theory pleasantly out of 
proportion with the effort involved. We propose, accordingly, to continue 
the study of multilinear algebra; our intention is to draw a more or less 
straight line between what we already know and the basic facts about de- 
terminants. With that in mind, we shall devote three sections to the dis- 
cussion of some simple facts about combinatorics; the connection between 
those facts and multilinear algebra will appear immediately after that 
discussion. 

By a permutation of the integers between 1 and k (inclusive) we shall 
mean a one-to-one transformation that assigns to each such integer another 
one (or possibly the same one). To say that the transformation m is one- 
to-one means, of course, that if (1), ---, (k) are the integers that r 
assigns to 1, ---, k, respectively, then +(¢) = w(j) can happen only in case 
i=j. Since this implies that both the sets {1,---, k} and {w(1), ---, r(k)} 
consist of exactly k elements, it follows that they consist of exactly the 
same elements. From this, in turn, we infer that a permutation 7 of the 
set {1, ---, k} maps that set onto itself, that is, that if 1 <j < k, then 
there exists at least one 7 (and, in fact, exactly one) such that w(z) = 7. 
The total number of the integers under consideration, namely, k, will be 
held fixed throughout the following discussion. 

The theory of permutations, like everything else, is best understood by 
staring hard at some non-trivial examples. Before presenting any exam- 
ples, however, we shall first mention some of the general things that can be 
done with permutations; by this means the examples will illustrate not only 
the basic concept but also its basic properties. 

If o and 7 are arbitrary permutations, a permutation (to be denoted by 
or) can be defined by writing 


(or) (i) = alri) 


for each i. To prove that or is indeed a permutation, observe that if 
(or)(i) = (o7)(j), then r(z) = r(y) (since o is one-to-one), and therefore 
i = j (since 7 is one-to-one). The permutation oz is called the product of 
the permutations e and r. Warning: the order is important. In general 
or ¥ ro, or, in other words, permutation multiplication is not commutative. 

Multiplication of permutations is associative; that is, if x, e, and r are 
permutations, then 


(1) (xo)r = x(o7). 
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To prove this, we must show that 
((xo)r)(t) = (x(or))(2) 


for every i. The proof consists of several applications of the definition of 
product, as follows: 


((ro)r) (i) = (wo) (rt) = x(o(r(®))), 
(x(or))(t) = w((or)(@)) = (or). 


In view of this result we may and shall omit parentheses in writing the 
product of three or more permutations. The result also enables us to prove 
the obvious laws of exponents. The powers of a permutation r are defined 
inductively by writing x! = m and x?*! = m-q? for all p = 1, 2, 3, +++; 
the associative law implies that r?x? = 2?+4 and (#”)* = q”! for all p and 
q. Observe that any two powers of a permutation commute with each 
other, that is, that x?a? = rir”. 

The simplest permutation is the identity (to be denoted by e); it is defined 
by e(i) = i for each i. If ris an arbitrary permutation, then 


and 


(2) er = TE = T, 


or, in other words, multiplication by e leaves every permutation unaffected. 
The proof is straightforward; for every i we have 


(er)(@) = e(r(i)) = x(i) 
(we)(t) = w(e(t)) = w(t). 


The permutation e behaves, from the point of view of multiplication, like 
the number 1. In analogy with the usual numerical convention, the zero-th 
power of every permutation r is defined by writing r° = e. 

If + is an arbitrary permutation, then there exists a permutation (to be 
denoted by +~) such that 


and 


(3) wy = an = e. 


To define m~! (j), where, of course, 1 < j < k, find the unique 7 such that 
w(t) = j, and write —'(j) = i; the validity of (3) is an immediate conse- 
quence of the definitions. The permutation x is called the inverse of r. 

Let S+ be the set of all permutations of the integers between 1 and k. 
What we have proved so far is that an operation of multiplication can be 
defined for the elements of $, so that (1) multiplication is associative, (2) 
there exists an identity element, that is, an element such that multiplica- 
tion by it leaves every element of $+ fixed, and (3) every element has an 
inverse, that is, an element whose product with the given one is the iden- 
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tity. A set satisfying (1)-(3) is called a group with respect to the concept 
of product that those conditions refer to; the set $+, in particular, is called 
the symmetric group of degree k. Observe that the integers 1, ---, k could 
be replaced by any k distinct objects without affecting any of the concepts 
defined above; the change would be merely a notational matter. 


§ 27. Cycles 


A simple example of a permutation is obtained as follows: choose any 
two distinct integers between 1 and k, say, p and q, and write 


t(p) =q, 
r(q) = p, 
(i) = i whenever i ¥ p andi ¥ q. 


The permutation r so defined is denoted by (p, q); every permutation of 
this form is called a transposition. If ris a transposition, then 7° = «. 

Another useful way of constructing examples is to choose p distinct inte- 
gers between 1 and k, say, 11, ---, ip, and to write 


o(1;) = ijp whenever 1 <j < p, 
alip) =i, 
a(i) = i whenever i # 4, +--+, t Æ ip. 


The permutation ø so defined is denoted by (i1, ---, ip). If p = 1, then 
o = «; if p = 2, then ø is a transposition. For any p with 1 <p S k, 
every permutation of the form (i1, ---, tp) is called a p-cycle, or simply a 
cycle; the 2-cycles are exactly the transpositions. Warning: it is not as- 
sumed that ii <--- <i. If, for instance, k = 5 and p = 3, then there 
are twenty distinct cycles. Observe also that the notation for cycles is not 
unique; the symbols (1, 2, 3), (2, 3, 1), and (3, 1, 2) all denote the same 
permutation. Two cycles (71, --+, tp) and (jı, +*+, jg) are disjoint if none 
of the 2’s is equal to any of the j’s. If « and r are disjoint cycles, then or 
= vø, or, in other words o and 7 commute. 


Turorem 1. Every permutation is the product of pairwise disjoint cycles. 


PROOF. If x is a permutation and if ¢ is such that w(t) + i (assume, for 
the moment, that + ~ e), form the sequence (i, r(i), r7(z), ---). Since 
there are only a finite number of distinct integers between 1 and k, there 
must exist exponents p and q (0 S p < q) such that r?(i) = r%(t). The 
one-to-one character of m implies that t9? (i) = i, or, with an obvious 
change of notation, what we have proved is that there must exist a strictly 
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positive exponent p such that x?(¢) = i. If p is selected to be the smallest 
exponent with this property, then the integers i, ---, ?~'(z) are distinct 
from each other. (Indeed, if0 S q <r < pand w*({z) = x(t), then 47 2(7) 
= i, contradicting the minimality of p.) It follows that (2, ---, 7?7*(7)) 
is a p-cycle. If there is a j between 1 and k different from each of 7, ---, 
x? (7) and different from 7(j), we repeat the procedure that led us to this 
cycle, with j in place of îi. We continue forming cycles in this manner as 
long as after each step we can still find a new integer that m does not send 
on itself; the product of the disjoint cycles so constructed is r. The case 
x = eis covered by the rather natural agreement that a product with no 
factors, an “empty product,” is to be interpreted as the identity permuta- 
tion. 


THEOREM 2. Every cycle is a product of transpositions. 


PROOF. Suppose that e is a p-cycle; for the sake of notational simplicity, 
we shall give the proof, which is perfectly general, in the special case p = 5. 
The proof itself consists of one line: 


(41, Zo, tg, i4, U5) = (ir, 25) (1, tg) (ir, ta) (1, i2). 


A few added words of explanation might be helpful. In view of the defini- 
tion of the product of permutations, the right side of the Jast equation 
operates on each integer between 1 and k from the inside out, or, perhaps 
more suggestively, from right to left. Thus, for example, the result of 
applying (21, ¢5)(t1, t4) (41, 73) (41, t2) to 23 is calculated as follows: (i1, t2) (tg) 
= ig, (in, %3)(i3) = t1, (tr, t4)(ti) = is, (in ts) (04) =i, so that 
(tr, t5) (ta, t4)(t1, 13) (tr, t2) (i3) = ta. 

For the sake of reference we put on record the following immediate corol- 
lary of the two preceding theorems. 


THEOREM 3. Every permutation is a product of transpositions. 


Observe that the transpositions in Theorems 2 and 3 were not asserted 
to be disjoint; in general they are not. 


EXERCISES 


1. (a) How many permutations are there in $4? 
(b) How many distinct p-cycles are there in S+ (1 S p S k)? 


2. Ifo and 7 are permutations (in S4), then (o7)—! = 7g, 


3. (a) Ifo and 7 are permutations (in $+), then there exists a unique permutation 
x such that ox = r. 
(b) If x, e, and r are permutations such that xo = xr, then ¢ = r. 
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4. Give an example of a permutation that is not the product of disjoint trans- 
positions. 

5. Prove that every permutation in S is the product of transpositions of the 
form (j, j + 1), where 1 S j < k. Is this factorization unique? 


6. Is the inverse of a cycle also a cycle? 


7. Prove that the representation of a permutation as the product of disjoint 
cycles is unique except possibly for the order of the factors. 


8. The order of a permutation is the least integer p (> 0) such that x? = e. 

(a) Every permutation has an order. 

(b) What is the order of a p-cycle? 

(c) Ifa is a p-cyele, 7 is a q-cycle, and o and 7 are disjoint, what is the order of 
or? 

(d) Give an example to show that the assumption of disjointness is essential in 


(e) If x is a permutation of order p and if x? = e, then q is divisible by p. 


9. Every permutation in $ (k > 1) can be written as a product, each factor of 
which is one of the transpositions (1, 2), (1, 3), (1, 4), +++ (1, k). 


10. Two permutations o and r are called conjugate if there exists a permutation 
x such that or = rr. Prove that o and r are conjugate if and only if they have 
the same cycle structure. (This means that in the representation of ø as a product 
of disjoint cycles, the number of p-cycles is, for each p, the same as the correspond- 
ing number for r.) 


§ 28. Parity 


Since (1, 3)(1, 2) = (1, 2)(2, 3)(= (1, 2, 3)), we see that the representa- 
tion of a permutation (even a cycle) as a product of transpositions is not 
necessarily unique. Since (1, 3)(1, 4)(1, 2)@, 4)(3, 2) = Q, 4)(, 3), 2) 
(= (1, 2, 3, 4)), we see that even the number of transpositions needed to 
factor a cycle is not necessarily unique. There is, nevertheless, something 
unique about the factorization, namely, whether the number of transposi- 
tions needed is even or odd. We proceed to state this result precisely, and 
to prove it. 

Assume, for simplicity of notation, that k = 4. Let f be the polynomial 
(in four variables ¢, te, tz, t4) defined by 


Sll tay te, te) = (hh — ta) (tr — ta) — ta) (le — ta) (t2 — Ks — t4). 


(In the general case f is the product of all the differences t; — t; with 
1<i<j<k.) Each permutation r in $4, converts f into a new polyno- 
mial, denoted by af; by definition 


(af) (ts, te, tg, ta) = Shta) tra) tra) tea) 
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In words: to obtain zf, replace each variable in f by the one whose subscript 
is obtained by allowing x to act on the subscript of the given one. If, for 
instance, r = (2, 4), then 


(rf) (ti, tz, ts, ta) = (t1 — U(r — ta) (6 — 2) (4 — t3) (ta — te) (ts — tə). 


If o = (1, 3, 2, 4), so that or = (1, 3, 2), then both (o(zf))(t, t2, tz, t4) and 
((o7)f) (th, te, t3, t4) are equal to 


(ts — ti) (tg — te) (ts — te) (4 — te) (tr — ta) (te — t4). 


These computations illustrate, and indicate the proofs of, three impor- 
tant facts. (1) For every permutation x, the factors of xf are the same as the 
factors of f, except possibly for sign and order; consequently xf = f or else 
af = —f. The permutation v is called even if xf = f and odd if af = —f. 
The signum (or sign) of a permutation r, denoted by sgn x, is +1 or —1 
according as x is even or odd, so that we always have af = (sgn r)f. The 
fact that r is even, or odd, is sometimes expressed by saying that the parity 
of x is even, or odd, respectively. (2) If r ts a transposition, then sgn r = 
—1, or, equivalently, every transposition is odd. The proof is the obvious 
generalization of the following reasoning about the special example (2, 4). 
Exactly one factor of f contains both tz and t4, and that one changes sign 
in the passage from f to af. If a factor contains neither ta nor t4, it stays 
fixed. The factors containing only one of tz and t4 come in pairs (such as 
the pair (tg — t3) and (t3 — t4), or the pair (tı — t2) and (tı — %4)). Each 
factor in such a pair goes into the other factor, except possibly that its 
sign may change; if it changes for one factor, it will change for its mate. 
(3) If o and r are permutations, then (or)f = o(rf); consequently or is even 
if and only if e and 7 have the same parity. Observe that sgn (or) = 
(sgn o)(sgn 7). 

It follows from (2) and (8) that a product of a bunch of transpositions 
is even if and only if there are an even number of them, and it is odd other- 
wise. (Note, in particular, by looking at the proof of § 27, Theorem 2, 
that a p-cycle is even if and only if p is odd; in other words, if ¢ is a p-cycle, 
then sgn e = (—1)?t.) Conclusion: no matter how a permutation x is 
factored into transpositions, the number of factors is always even (this is 
the case if r is even), or else it is always odd (this is the case if x is odd). 

The product of two even permutations is even; the inverse of an even 
permutation is even; the identity permutation is even. These facts are 
summed up by saying that the set of all even permutations is a subgroup 
7 Sz; this subgroup (to be denoted by @;) is called the alternating group of 

egree k. 
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EXERCISES 
1. How many permutations are there in @,? 


2. Give examples of even permutations with even order and even permutations 
with odd order; do the same for odd permutations. 


3. Every permutation in @, (k > 2) can be written as a product, each factor of 
which is one of the 3-cycles (1, 2, 3), (1, 2, 4), «++, (1, 2, k). 


§ 29. Multilinear forms 


We are now ready to proceed with multilinear algebra. The basic con- 
cept is that of multilinear form (or functional), an easy generalization of 
the concept of bilinear form. Suppose that U1, ---, Ux are vector spaces 
(over the same field); a k-linear form (k = 1, 2, 3, ---) is a scalar-valued 
function on the direct sum U; ® --- © Ux with the property that for each 
fixed value of any k — 1 arguments it depends linearly on the remaining 
argument. The 1-linear forms are simply the linear functionals (on U1), and 
the 2-linear forms are the bilinear forms (on U1 @ V2). The 3-linear (or 
trilinear) forms are the scalar-valued functions w (on U, ® Ve @ Ug) such 


that 
wary + ae%e, Y, z) = aw(t, Y, z) + agw (ra, Y, 2), 


and such that similar identities hold for w(x, a1y; + oeye, 2) and w(z, y, 
ay 21 + ag z2). A function that is k-linear for some k is called a multilinear 
form. 

Much of the theory of bilinear forms extends easily to the multilinear 
case. Thus, for instance, if w; and wz are k-linear forms, if a1 and az are 
scalars, and if w is defined by 


wh, ++, Te) = aW (41, +++, Te) + agWe(%, +++, Te) 


whenever z; is in U;, i = 1, +-+, k, then w is a k-linear form, denoted by 
QW, + azwz. The set of all k-linear forms is a vector space with respect 
to this definition of the linear operations; the dimension of that vector 
space is the product nı --- ng, where, of course, n; is the dimension of Uj. 
The proofs of all these statements are just like the proofs (in § 23) of the 
corresponding statements for the bilinear case. We could go on imitating 
the bilinear theory and, in particular, studying multiple tensor products. 
In order to hold our multilinear digression to a minimum, we shall proceed 
instead in a different, more special, and, for our purposes, more useful 
direction. 

In what follows we shall restrict our attention to the case in which the 
k spaces U; are all equal to one and the same vector space, say, U; we shall 
assume that VU is finite-dimensional. In this case we shall call a “k-linear 
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form on 0; @--- @U;” simply a “k-linear form on U,” or, even more 
simply, a “‘k-linear form”; the language is slightly inaccurate but, in con- 
text, completely unambiguous. If the dimension of “© is n, then the dimen- 
sion of the vector space of all k-linear forms is n”. The space U and, of 
course, the dimension n will be held fixed throughout the following discus- 
sion. 

The special character of the case we are studying enables us to apply a 
technique that is not universally available; the technique is to operate on 
k-linear forms by permutations in $+. If w is a k-linear form, and if z is in 
Sk, we write 

TWE, +++, Xe) = Wr, * tty Eak) 


whenever 2, ---, 2, arein U. The function rw so defined is again a k-linear 
form. (The value of rw at (zı, ---, £z) is more honestly denoted by 
(ww) (x1, °+*, £); since, however, the simpler notation does not appear to 
lead to any confusion, we shall continue to use it.) 

Using the way permutations act on k-linear forms, we can define some 
interesting sets of such forms. Thus, for instance, a k-linear form w is 
called symmetric if rw = w for every permutation r in $+. (Note that if 
k = 1, then this condition is trivially satisfied.) The set of all symmetric 
k-linear forms is a subspace of the space of all k-linear forms. Hence, in 
particular, the origin of that space, the k-linear form 0, is symmetric. For 
a non-trivial example, suppose that k = 2, let yı and ye be linear func- 
tionals on U, and write 


w(x1, 2) = yi(t1)y2(t2) + Yı (£2)y2(£1). 


This procedure for constructing k-linear forms has useful generalizations. 
Thus, for instance, if 1 < h < k < n, and if u is an h-linear form and v is 
a (k — h)-linear form, then the equation 


w(z1, oy Zk) = ulz, wey Xp) -v(tasay a) Lr) 


defines a k-linear form w, which, in general, is not symmetric. A symmetric 
k-linear form can be obtained from w (or, for that matter, from any given 
k-linear form) by forming È` mw, where the summation is extended over 
all permutations a in $z. 

We shall not study symmetric k-linear forms any more. We introduced 
them here because they constitute a very natural class of functions de- 
finable in terms of permutations. We abandon them now in favor of 
another class of functions, which play a much greater role in the theory. 
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§ 30. Alternating forms 


A k-linear form w is skew-symmetric if rw = —w for every odd permuta- 
tion z in Sp. Equivalently, w is skew-symmetric if rw = (sgn r)w for every 
permutation r in $. (If rw = (sgn r)w for all r, then, in particular, rw 
= —w whenever ~r is odd. If, conversely, rw = —w for all odd x, then, 
given an arbitrary x, factor it into transpositions, say, T = T1 *** Tg, Ob- 
serve that sen m = (—1)%, and, since rw = (—1)%w, conclude that rw = 
(sgn r)w, as asserted. This proof makes tacit use of the unproved but 
easily available fact that if ¢ and r are permutations in Sg, then o(rw) = 
(or)w.) The set of all skew-symmetric k-linear forms is a subspace of the 
space of all k-linear forms. To get a non-trivial example of a skew-symmet- 
ric bilinear form w, let yı and ye be linear functionals and write 


wa, z2) = yr(t1)y2(t2) — yı(z2)}y2(21). 


More generally, if w is an arbitrary k-linear form, a skew-symmetric k-linear 
form can be obtained from w by forming È, (sgn r)rw, where the summa- 
tion is extended over all permutations r in $x. 

A k-linear form w is called alternating if w(x1, - - +, £k) = 0 whenever two 
of the x’s are equal. (Note that if k = 1, then this condition is vacuously 
satisfied.) The set of all alternating k-linear forms is a subspace of the 
space of all k-linear forms. There is an important relation between alter- 
nating and skew-symmetric forms. 


THEOREM 1. Every alternating multilinear form is skew-symmetric. 


PROOF. Suppose that w is an alternating k-linear form, and that 7 and j 
are integers, 1 Si <j Sk. Ifa, ---, £p are vectors, we write 


Wo (xs, Ti) = war, +++, te); 


if the z’s other than z; and 2; are held fixed (temporarily), then wo is an 
alternating bilinear form of its two arguments. Since, by bilinearity, 


Wolt; + zj, Zi + zj) = wolt; xi) + wlr, zj) + wlr, Zi) + wols, 5), 


and since, by the alternating character of wo, the left side and the two ex- 
treme terms of the right side of this equation all vanish, we see that wo(z;, 21) 
= —wo(x;,2;). This, however, says that 


(1, jw, sery Tk) = —w(t, Tey Lk), 


or, since the z’s are arbitrary, that (i, j)w = —w. Since every odd permu- 
tation ~ is the product of an odd number of transpositions, such as (i, j), 
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it follows that rw = —w for every odd r, and the proof of the theorem is 
complete. 

The connection between alternating forms and skew-symmetric ones in- 
volves one subtle point. Consider the following “proof” of the converse of 
Theorem 1: if w is a skew-symmetric k-linear form, if 1 £ i <j < k, and 
if zı, +++, 2 are vectors such that z; = 2;, then (i, j)w(a1, +--+, £k) = 
w(x, «++, £) since z; = zj, and at the same time, (i, J)w(m, ---, xe) = 
—w(%1, +++, Xe) since w is skew-symmetric; consequently w(x, +, £k) 
= —w(2, +, te), 80 that w is alternating. This argument is wrong; the 
trouble is in the inference “if w = —w, then w = 0.” If we examine that 
inference in more detail, we find that it is based on the following reasoning: 
if w = —w, then w + w = 0, so that (1 + 1)w = 0. This is correct. The 
trouble is that in certain fields 1 +- 1 = 0, and therefore the inference from 
(1 + 1)w = 0 to w = 0 is not justified; the converse of Theorem 1 is, in 
fact, false for vector spaces over such fields. 


THEOREM 2. If xı, «++, £a are linearly dependent vectors and if w is an 
alternating k-linear form, then w(a1, ++, £e) = 0. 


PROOF. If x; = 0 for some ŝi, the conclusion is trivial. If all the z; are 
different from 0, we apply the theorem of §6 to find an za, 25h <k, 
that is a linear combination of the preceding ones. If, say, z, = > bro aifi, 
replace x, in w(t, ---, xk) by this expansion, use the linearity of 
w(21, ** +, ze) in its k-th argument, and draw the desired conclusion by an 
(h — 1)-fold application of the assumption that w is alternating. 

In one extreme case (namely, when k = n) a sort of converse of Theorem 
2 is true. 


THEOREM 3. If w is a non-zero alternating n-linear form, and if x1, +++, En 
are linearly independent vectors, then w(x1, --+, In) ¥ 0. 


PROOF, Since (§ 8, Theorem 2) the vectors 21, ---, £n form a basis, we 
may, given an arbitrary set of n vectors y1, «>, Yn, write each y as a linear 
combination of the z’s. If we replace each y in w(y1, ---, yn) by the cor- 
responding linear combination of z’s and expand the result by multilinear- 
ity, we obtain a long linear combination of terms such as wei, >t, Za), 
where each z is one of the z’s. If, in such a term, two of the z’s coincide, 
then, since w is alternating, that term must vanish. If, on the other hand, 


all the z’s are distinct, then w(z1, ---, zn) = ww(x1, -+-, Za) for some per- 
mutation x. Since (Theorem 1) w is skew-symmetric, it follows that 
w(z1, +++, Zn) = (sgn T)W(T1, ---, £n). If w(t, ---, £n) were 0, it would 


follow that w(21, °°") 2n) = 0, and hence that w(y,, ---, Yn) = O for all 
Vi, °°, Yn, contradicting the assumption that w = 0. 
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The proof (not the statement) of this result yields a valuable corollary. 


THEOREM 4. Any two alternating n-linear forms are linearly dependent. 


PROOF. Suppose that wı and wz are alternating n-linear forms and that 
{z1, +--+, Zn} isa basis. Given any n vectors yı, ---, Yn, write each of them 
as a linear combination of the x’s, and, just as above, replace each of them, 
in both w;(y1, «++, Yn) and we(y1, ---, Yn), by the corresponding linear com- 
bination. It follows that each of wi(y1, ---, Yn) and we(y1, ---, Yn) is a 
linear combination (the same linear combination) of terms such as w, (2), 
+++, Zn) and we(z1, ***, Zn), Where each z is one of the x’s. Since wi(x, +++, 
Zn) and w(x, «°°, Zn) are scalars, they are linearly dependent, so that 
there exist scalars a, and ag not both zero, such that a;w,(x%, -+-, Za) 
+ agwe(%1, **-, Xn) = 0; from these facts we may infer that awy + aqwe 
= 0, as asserted. 


§31. Alternating forms of maximal degree 


Glancing back at the last section, the reader will observe that we did not 
give any non-trivial examples of alternating k-linear forms, and we did not 
even indirectly hint at any existence theorem concerning them. In fact 
they do not always exist; § 30, Theorem 2 implies, for instance, that if 
k > n, then 0 is the only alternating k-linear form. (See § 8, Theorem 2.) 
For the applications we have in mind, we need only one existence theorem; 
we proceed to prove a rather sharp form of it. 


THEOREM. If n > 0, the vector space of alternating n-linear forms on 
an n-dimensional vector space is one-dimensional. 


PROOF. We show first that if 1 S k < n, then there exists at least one 
non-zero alternating k-linear form; the proof goes by induction on k. If 
k = 1, the desired result follows from the existence of non-trivial linear 
functionals (see § 15, Theorem 3). If 1 £ k < n, we assume that vis a non- 
zero alternating k-linear form; using v we shall construct a non-zero alter- 
nating (k + 1)-linear form w. Since v = 0, we can find vectors qf, ---, z 
such that v(x, - - -, z$) = O (the superscripts are just indices here). Since 
k < n, we can find a vector x2, that does not belong to the subspace 
spanned by 2°, -+ +, zz, and (see § 17, Theorem 1) then we can find a linear 
functional u such that u(x?) = --- = u(x?) = 0 and u(z?,,) # 0. 

The promised (k + 1)-linear form w is obtained from the linear func- 
` tional u and the k-linear form v by writing 


(1) w(t +*+, te, Tega) = Defer (i, k + Moe, +++, Te)u(te41) 


—v(%1,°°'s Zp)u(ress). 
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Thus, for instance, if k = 3, then 
w(%1, T2, 23, T4) = v(€4, Lo, T3JU(21) + v(a4, T4, 13)U(22) 

+ v(x1, 22, X4)u(xg) — v(x, T2, T3Ju(z4). 


It follows from the elementary discussion in § 29 that w is indeed a (k + 1)- 
linear form; we are to prove that it is non-zero and alternating. 

The fact that w is not identically zero is easy to prove. Indeed, since 
u(z?) = 0 for i = 1, ---, k, it follows that if we replace each z; by x? in 
(1), = 1, +++, k + 1, then the first k terms of the sum on the right all 
vanish, and, consequently, 


(2) w(x), wry x, zl) = —o(zh, a) wp)u(xe41) * 0. 


Suppose now that 21, ---, xz, %441 are vectors and i and j are integers 
such that 1St<jSk+1 and z; = z}. We are to prove that, under 
these circumstances, w(x1, --+, £k, 2441) = 0. We note that both z; and 
z; occur in the argument of v in all but two of the k + 1 terms on the right 
side of (1). Since v is alternating, the terms in which both z; and z; do so 
occur all vanish. 

The remainder of the proof splits naturally into two cases. Ifj = k+ 1, 
then all that is left is 


G, k + 1)v@i, + %3 xx) Ute 41) —v(%1, °°, ©p)U(te41), 


and, since z; = 7x41, this is clearly equal to 0. Ifj < k, then each of the 
two possibly non-vanishing terms that are still left can be obtained from 
the other by an application of the transposition (7, j). It follows that those 
terms differ in sign only, and hence that their sum is zero. This proves 
that w is alternating and proves, therefore, that the dimension of the space 
of alternating n-linear forms is not less than 1. 

The fact that the dimension of the space of alternating n-linear forms 
is not more than 1 is an immediate consequence of § 30, Theorem 4. 

This concludes our discussion of multilinear algebra. The reader might 
well charge that the discussion was not very strongly motivated. The 
complete motivation cannot be contained in this book; the justification for 
studying multilinear algebra is the wide applicability of the subject. The 
only application that we shall make is to the theory of determinants (which, 
to be sure, could be treated by more direct but less elegant methods, involv- 
ing much greater dependence on arbitrary choices of bases); that applica- 
tion belongs to the next chapter. 
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EXERCISES 


1. Interpret the following matrices as linear transformations on @’ or ©’ and, in 
each case find a basis such that the matrix of the transformation with respect to 
that basis is triangular. 


2. Give an example of a skew-symmetric multilinear form that is not alternating. 
(Recall that in view of the discussion in § 30 the field of scalars must have charac- 
teristic 2.) 


3. Give an example of a non-zero alternating k-linear form w on an n-dimensional 
space (k < n), such that w(z£1, +*+, zx) = 0 for some set of linearly independent 
vectors Ti, ***, Tk. 


4. What is the dimension of the space of all symmetric k-linear forms? What 
about the skew-symmetric ones? What about the alternating ones? 


CHAPTER II 


TRANSFORMATIONS 


§ 32. Linear transformations 
We come now to the objects that really make vector spaces interesting. 


DEFINITION. A linear transformation (or operator) A on a vector space 
© is a correspondence that assigns to every vector x in U a vector Az 
in VU, in such a way that 


Alax + By) = aAz + BAy 
identically in the vectors z and y and the scalars a and £. 


We make again the remark that we made in connection with the defini- 
tion of linear functionals, namely, that for a linear transformation A, as we 
defined it, AO = 0. For this reason such transformations are sometimes 
called homogeneous linear transformations. 

Before discussing any properties of linear transformations we give sev- 
eral examples. We shall not bother to prove that the transformations we 
mention are indeed linear; in all cases the verification of the equation that 
defines linearity is a simple exercise. 

(1) Two special transformations of considerable importance for the study 
that follows, and for which we shall consistently reserve the symbols 0 and 
l respectively, are defined (for all z) by Ox = 0 and Iz = z. 

(2) Let ao be any fixed vector in U, and let yo be any linear functional 
on U; write Ar = yo(x)-2 9. More generally: let {z1, ++, £n} be an arbi- 
trary finite set of vectors in V and let {yi, --+, yn} be a corresponding set 
of linear functionals on U; write Ax = y;(x)z1 +--+ ya(z)tn. It is not 
difficult to prove that if, in particular, U is n-dimensional, and the vectors 
21, +++, Zn form a basis for V, then every linear transformation A has the 
form just described. 
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(3) Let x be a permutation of the integers {1, ---, n}; if = (£1, ++, En) 
is a vector in C”, write Az = (fra) °**, Ẹr(n)). Similarly, let r be a poly- 
nomial with complex coefficients; if x is a vector (polynomial) in ©, write 
Az = y for the polynomial defined by y(t) = z(t). 

(4) For any z in On, z(t) = DR gt, write (De) = Dao jet? 
(We use the letter D here as a reminder that Dz is the derivative of the 
polynomial z. We remark that we might have defined D on © as well as 
on Gn; we shall make use of this fact later. Observe that for polynomials 
the definition of differentiation can be given purely algebraically, and does 
not need the usual theory of limiting processes.) 

(5) For every x in @, z(t) = DIa t, write Sx = Doi 7h ae 


(Once more we are disguising by algebraic notation a well-known analytic 


concept. Just as in (4) (Dr) (ù stood for = $ 
f x(s) ds.) 
0 


(6) Let m be a polynomial with complex coefficients in a variable t. 
(We may, although it is not particularly profitable to do so, consider m 
as an element of ©.) For every x in @, we write Mz for the polynomial 
defined by (Mz)(t) = m(z(). For later purposes we introduce a special 
symbol; in case m(t) = t, we shall write T for the transformation M, 
so that (T'x)(é) = tx(t). 


so here (Sx) (i) is the same as 


§ 33. Transformations as vectors 


We proceed now to derive certain elementary properties of, and relations 
among, linear transformations on a vector space. More particularly, we 
shall indicate several ways of making new transformations out of old ones; 
we shall generally be satisfied with giving the definition of the new trans- 
formations and we shall omit the proof of linearity. 

If A and B are linear transformations, we define their sum, S = A + B, 
by the equation Sx = Az + Bx (for every x). We observe that the 
commutativity and associativity of addition in U imply immediately that 
the addition of linear transformations is commutative and associative. 
Much more than this is true. If we consider the sum of any linear trans- 
formation A and the linear transformation 0 (defined in the preceding sec- 
tion), we see that A + 0 = A. If, for each A, we denote by —A the trans- 
formation defined by (—A)z = —(Az), we see that A + (A) = 0, and 
that the transformation —A, so defined, is the only linear transformation 
B with the property that A + B= 0. To sum up: the properties of a 
vector space, described in the axioms (A) of § 2, appear again in the set of 
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all linear transformations on the space; the set of all linear transformations 
is an abelian group with respect to the operation of addition. 

We continue in the same spirit. By now it will not surprise anybody if 
the axioms (B) and (C) of vector spaces are also satisfied by the set of all 
linear transformations. They are. For any A, and any scalar œ, we define 
the product aA by the equation (aA)z = a(Az). Axioms (B) and (C) are 
immediately verified; we sum up as follows. 


Tueorem. The set of all linear transformations on a vector space is itself 
a vector space. 


We shall usually ignore this theorem; the reason is that we can say much 
more about linear transformations, and the mere fact that they form a 
vector space is used only very rarely. The “much more” that we can say 
is that there exists for linear transformations a more or less decent definition 
of multiplication, which we discuss in the next section. 


EXERCISES 


1. Prove that each of the correspondences described below is a linear trans- 
formation. 

(a) U is the set € of complex numbers regarded as a real vector space; Az is the 
complex conjugate of x. 

(b) Y is @; if x is a polynomial, then (Az)(!) = z(t + 1) — a(t). 

(c) U is the k-fold tensor product of a vector space with itself; A is such that 


A(t; Q +--+ Q 2k) = trm © +++ Q Træ, where r is a permutation of {1, ---, k}. 
(®© © is the set of all k-linear forms on a vector space; (Aw)(x1, +++, t%) = Wtr) 
+++, 2eq)), where r is a permutation of {1, +, k}. 


(e) U is the set of all k-linear forms on a vector space; if w is in U, then Aw = 
rw, where the summation is extended over all permutations 7 in $x. 
(f) Same as (e) except that Aw = >> (sgn r) rw. 


_ 2. Prove that if U is a finite-dimensional vector space, then the space of all 
linear transformations on U is finite-dimensional, and find its dimension. 


3. The concept of a “linear transformation,” as defined in the text, is too special 
for some purposes. According to a more general definition, a linear transformation 
from a vector space U to a vector space U over the same field is a correspondence A 
that assigns to every vector z in U a vector Az in U so that 


Alax + By) = aAx + BAy. 


Prove that each of the correspondences described below is a linear transformation 
in this generalized sense. 

(a) U is the field of scalars of U; A is a linear functional on U. 
_ (b) U is the direct sum of U with some other space; A maps each pair in U onto 
its first coordinate. 

(c) U is the quotient of U modulo a subspace; A maps each vector in U onto 
the coset it determines. 
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(d) Let w be a bilinear functional on a direct sum U@ Uo. Let VU be the dual 
of Uo, and define A to be the correspondence that assigns to each zo in U the linear 
functional on Uo obtained from w by setting its first argument equal to zo. 


4. (a) Suppose that U and VU are vector spaces over the same field. If A and 
B are linear transformations from U to U, if a and £ are scalars, and if 
Cz = aAz + BBx 


for each z in U, then C is a linear transformation from U to U. 

(b) If we write, by definition, C = aA + 8B, then the set of all linear trans- 
formations from U to U becomes a vector space with respect to this definition of 
the linear operations. 

(c) Prove that if U and U are finite-dimensional, then so is the space of all linear 
transformations from U to U, and find its dimension. 


5. Suppose that M is an m-dimensional subspace of an n-dimensional vector 
space U. Prove that the set of those linear transformations A on Ù for which 
Az = 0 whenever z is in M is a subspace of the set of all linear transformations on 
Y, and find the dimension of that subspace. 


§ 34. Products 


The product P of two linear transformations A and B, P = AB, is de- 
fined by the equation Pr = A(Bz). 

The notion of multiplication is fundamental for all that follows. Before 
giving any examples to illustrate the meaning of transformation products, 
let us observe the implications of the symbolism, P = AB. To say that 
P is a transformation means, of course, that given a vector x, P does some- 
thing to it. What it does is found out by operating on x with B, that is, 
finding Bx, and then operating on the result with A. In other words, if 
we look on the symbol for a transformation as a recipe for performing a 
certain act, then the symbol for the product of two transformations is to 
be read from right to left. The order to transform by AB means to trans- 
form first by B and then by A. This may seem like an undue amount of 
fuss to raise about a small point; however, as we shall soon see, transforma- 
tion multiplication is, in general, not commutative, and the order in 
which we transform makes a lot of difference. 

The most notorious example of non-commutativity is found on the 
space @. We consider the differentiation and multiplication transforma- 


tions D and T, defined by (Dz) () = and (Tx)(t) = é(é); we have 


DT. axe = z(t dz 
(DT2)) = z 0) = x) Hi 
and 

TD2)() = tË. 

dt 
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In other words, not only is it false that DT = TD (so that DT — TD = 0), 
but, in fact, (DT — TD)x = z for every x, so that DT — TD = 1. 

On the basis of the examples in § 32, the reader should be able to con- 
struct many examples of pairs of non-commutative transformations. Those 
who are used to thinking of linear transformations geometrically can, for 
example, readily convince themselves that the product of two rotations of 
a? (about the origin) depends in general on the order in which they are 
performed. 

Most of the formal algebraic properties of numerical multiplication (with 
the already mentioned notable exception of commutativity) are valid in 
the algebra of transformations. Thus we have 


(1) AO = 0A = 0, 
(2) Al=1A =A, 
(3) A(B + C) = AB + AC, 
(4) (A + B)C = AC + BC, 
(5) A(BC) = (AB)C. 


The proofs of all these identities are immediate consequences of the defini- 
tions of addition and multiplication; to illustrate the principle we prove (3), 
one of the distributive laws. The proof consists of the following computa- 
tion: 


(A(B + C))z = A((B + C)z) = A(Bz + Cz) 
= A(Br) + A(Cr) = (AB)x + (AC)x 
= (AB + AC)z. 


§ 35. Polynomials 


The associative law of multiplication enables us to write the product of 
three (or more) factors without any parentheses; in particular we may 
consider the product of any finite number, say, m, of factors all equal to 
A. This product depends only on A and on m (and not, as we just re- 
marked, on any bracketing of the factors); we shall denote it by A”. The 
Justification for this notation is that, although in general transformation 
multiplication is not commutative, for the powers of one transformation 
we do have the usual laws of exponents, A"A™ = A"+™ and (A")™ = A™™, 
We observe that A’ = A; it is customary also to write, by definition, 
A? = 1. With these definitions the calculus of powers of a single trans- 
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formation is almost exactly the same as in ordinary arithmetic. We may, 
in particular, define polynomials in a linear transformation. Thus if p 
is any polynomial with scalar coefficients in a variable t, say p(t) = ao 
+ ait ++--+ ant", we may form the linear transformation 


p(A) = aol + mA +---+ aA”. 


The rules for the algebraic manipulation of such polynomials are easy. 
Thus p(t) (t) = r(t) implies p(A)g(A) = r(A) (so that, in particular, any 
p(A) and g(A) are commutative); if p(t) = a (identically), we shall usually 
write p(A) = a (instead of p(A) = a-1); this is in harmony with the use 
of the symbols 0 and 1 for linear transformations. 

If p is a polynomial in two variables and if A and B are linear transforma- 
tions, it is not usually possible to give any sensible interpretation to p(A, B). 
The trouble, of course, is that A and B may not commute, and even a simple 
monomial, such as st, will cause confusion. If p(s, Ò = s*t, what should 
we mean by p(A, B)? Should it be A?B, or ABA, or BA?? It is important 
to recognize that there is a difficulty here; fortunately for us it is not neces- 
sary to try to get around it. We shall work with polynomials in several 
variables only in connection with commutative transformations, and then 
everything is simple. We observe that if AB = BA, then A”B™ = B™A”, 
and therefore p(A, B) has an unambiguous meaning for every polynomial 
p. The formai properties of the correspondence between (commutative) 
transformations and polynomials are just as valid for several variables as 
for one; we omit the details. 

For an example of the possible behavior of the powers of a transformation 
we look at the differentiation transformation D on © (or, just as well, on 
En, for some n). It is easy to see that for every positive integer k, and for 


k 
every polynomial z in ©, we have (D*x)(t) = T . We observe that what- 


ever else D does, it lowers the degree of the polynomial on which it acts 
by exactly one unit (assuming, of course, that the degree is = 1). Let 
x be a polynomial of degree n — 1, say; what is D"x? Or put it another 
way: what is the product of the two (commutative) transformations D* 
and D"—* (where k is any integer between 0 and n), considered on the 
space @,? We mention this example to bring out the disconcerting fact 
implied by the answer to the last question; the product of two transforma- 
tions may vanish even though neither one of them is zero. A non-zero 
transformation whose product with some non-zero transformation is zero 
is called a divisor of zero. 
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EXERCISES 


1. Calculate the linear transformations D'S" and S*D*, n = 1, 2, 3, ; in 
other words, compute the effect of each such transformation on an ubikaiy ele- 
ment of @. (Here D and S denote the differentiation and integration transforma- 
tions defined in § 32.) 


2. If A and B are linear transformations such that AB — BA commutes with 
A, then A+B — BA* = kA*-\AB — BA) for every positive integer k. 


3. Suppose that Az(é) = z(t + 1) for every z in np; prove that if D is the dif- 

ferentiation operator, then 
D D De 
1 +H i+ or pe apoc ( — 1)! =A, 

4. (a) If A is a linear transformation on an n-dimensional vector space, then 
there exists a non-zero polynomial p of degree S n? such that p(A) = 0. 

(b) If Ax = yo(x)xo (see § 32, (2)), find a non-zero polynomial p such that p(A) 
= 0. What is the smallest possible degree p can have? 


5. The product of linear transformations between different vector spaces is 
defined only if they “match” in the following sense. Suppose that U, U, and W 
are vector spaces over the same field, and suppose that A and B are linear traus- 
formations from U to U and from U to W, respectively. The product C = BA 
(the order is important) is defined to be the linear transformation from U to W 
given by Cz = B(Az). Interpret and prove as many as possible among the equa- 
tions § 34, (1)-(5) for this concept of multiplication. 


6. Let A be a linear transformation on an n-dimensional vector space U. 

(a) Prove that the set of all those linear transformations B on U for which 
AB = 0 is a subspace of the space of all linear transformations on VU. 

(b) Show that by a suitable choice for A the dimension of the subspace de- 
scribed in (a) can be made to equal 0, or n, or n?. What values can this dimension 
attain? 

(c) Can every subspace of the space of all linear transformations be obtained 
in the manner described in (a) (by the choice of a suitable A)? 


7. Let A be a linear transformation on a vector space U, and consider the cor- 
respondence that assigns to each linear transformation X on U the linear transforma- 
tion AX. Prove that this correspondence is a linear transformation (on the space 
of all linear transformations). Can every linear transformation on that space be 
obtained in this manner (by the choice of a suitable A)? 


§ 36. Inverses 


In each of the two preceding sections we gave an example; these two 
examples bring out the two nasty properties that the multiplication of 
linear transformations has, namely, non-commutativity and the existence 
of divisors of zero. We turn now to the more pleasant properties that 
linear transformations sometimes have. 
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It may happen that the linear transformation A has one or both of the 
following two very special properties. 


(i) If zı Æ ze, then Ar, ¥ Azz. 
(ii) To every vector y there corresponds (at least) one vector x such that 
Az = y. 


If ever A has both these properties we shall say that A is invertible. If 
A is invertible, we define a linear transformation, called the inverse of 
A and denoted by A, as follows. If yo is any vector, we may (by (ii)) 
find an zo for which Arp = yo. This zo is, moreover, uniquely determined, 
since to ~ zı implies (by (i)) that yo = Azo ~ Azı. We define Ayo 
to be zo. To prove that A~? is linear, we evaluate A (ayy, + acy). If 
Az, = yı and Axg = ye, then the linearity of A tells us that A (a2, + a22%2) 
= ayi + ayz, so that Ag (ayy: + azy2) = azı + azz = a Aly; + 
aA ype. 

As a trivial example of an invertible transformation we mention the 
identity transformation 1; clearly 1~! = 1. The transformation 0 is not 
invertible; it violates both the conditions (i) and (ii) about as strongly as 
they can be violated. 

It is immediate from the definition that for any invertible A we have 


AAT=A 7A =1; 
we shall now show that these equations serve to characterize A~, 
THEOREM 1. Jf A, B, and C are linear transformations such that 
AB =CA =1, 
then A is invertible and A™ = B = C. 


PROOF. If Ax, = Axe, then CAz, = CAxe, so that (since CA = 1) 
zı = Ze} in other words, the first condition of the definition of invertibility 
is satisfied. The second condition is also satisfied, for if y is any vector and 
x = By, then y = ABy = Ax. Multiplying AB = 1 on the left, and 
CA = 1 on the right, by A~', we see that A™! = B = C. 

To show that neither AB = 1 nor CA = 1 is, by itself, sufficient to 
ensure the invertibility of A, we call attention to the differentiation and 
integration transformations D and 5S, defined in § 32, (4) and (5). Although 
DS = 1, neither D nor S is invertible; D violates (i), and S violates (ii). 

In finite-dimensional spaces the situation is much simpler. 


THEOREM 2. A linear transformation A on a finite-dimensional vector 
space U is invertible if and only if Ax = 0 implies that z = 0, or, al- 
ternatively, if and only if every y in U can be written in the form y = Ax. 
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PROOF. If A is invertible, both conditions are satisfied; this much is 
trivial. Suppose now that Az = 0 implies that c=0. Then u = v, 
that is, u — v #0, implies that A(u — v) #0, that is, that Au = Av; 
this proves (i). To prove (ii), let {21, ---, tn} be a basis in U; we assert 
that {Az, ---, Aza} is also a basis. According to §8, Theorem 2, we 
need only prove linear independence. But È}; a,Az; = 0 means A (Ñ; a:7;) 
= 0, and, by hypothesis, this implies that >>; a;x; = 0; the linear in- 


dependence of the x; now tells us that aj =+--= a, =0. It follows, of 
course, that every vector y may be written in the form y = >>; a;Az; 
=A(D): at). 


Let us assume next that every y is an Az, and let {y1, ---, yn} be any 
basis in U. Corresponding to each y; we may find a (not necessarily unique) 
x; for which y; = Az;; we assert that {x,, ---, tn} is also a basis. For 
X; Qifti = 0 implies > a;Az; = > ay: = 0, so that @ = e= An = 0. 
Consequently every z may be written in the form z = È; aiz; and Ar = 0 
implies, as in the argument just given, that z = 0. 


THEOREM 3. If A and B are invertible, then AB is invertible and (AB) 
=B A, If A is invertible and a = 0, then aA is invertible and (2A)~! 


1 
=—AT!, If A is invertible, then Aq is invertible and (ATH) = A. 
Q 


proor. According to Theorem 1, it is sufficient to prove (for the first 
statement) that the product of AB with BAT}, in both orders, is the 
identity; this verification we leave to the reader. The proofs of both the 
remaining statements are identical in principle with this proof of the first 
statement; the last statement, for example, follows from the fact that the 
equations AAT! = A~1A = 1 are completely symmetric in A and A-. 

We conclude our discussion of inverses with the following comment. 
In the spirit of the preceding section we may, if we like, define rational 
functions of A, whenever possible, by using A~}. We shall not find it 
aseful to do this, except in one case: if A is invertible, then we know that 
A” is also invertible, n = 1, 2, ---; we shall write A~” for (A")—}, so that 
Av? = ( A Zn 


EXERCISES 
1. Which of the linear transformations described in § 33, Ex. 1 are invertible? 
2. A linear transformation A is defined on @? by 
ACE, £2) = (afi + Btr, YEr + ôt), 


sie a, 8, Y, and ô are fixed scalars. Prove that A is invertible if and only if ad — 
ry 0. 
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3. If A and B are linear transformations (on the same vector space), then a 
necessary and sufficient condition that both A and B be invertible is that both AB 
and BA be invertible. 


4. If A and B are linear transformations on a finite-dimensional vector space, 
and if AB = 1, then both A and B are invertible. 


5. (a) If A, B, C, and D are linear transformations (all on the same vector space), 
and if both A + B and A — B are invertible, then there exist linear transforma- 
tions X and Y such that 

AX+BY=C 
and 
BX + AY =D. 


(b) To what extent are the invertibility assumptions in (a) necessary? 


6. (a) A linear transformation on a finite-dimensional vector space is invertible 
if and only if it preserves linear independence. To say that A preserves linear in- 
dependence means that whenever X is a linearly independent set in the space U 
on which A acts, then AX is also a linearly independent set in U. (The symbol 
AX denotes, of course, the set of all vectors of the form Az, with x in X.) 

(b) Is the assumption of finite-dimensionality needed for the validity of (a)? 


7. Show that if A is a linear transformation such that A? — A + 1 = 0, then 
A is invertible. 


8. If A and B are linear transformations (on the same vector space) and if AB 
= 1, then A is called a left inverse of B and B is called a right inverse of A. Prove 
that if A has exactly one right inverse, say B, then A is invertible. (Hint: consider 
BA+ B-—1,) 

9. If A is an invertible linear transformation on a finite-dimensional vector 
space U, then there exists a polynomial p such that A~! = p(A). (Hint: find a 
non-zero polynomial g of least degree such that (4) = 0 and prove that its constant 
term cannot be 0.) 


10. Devise a sensible definition of invertibility for linear transformations from 
one vector space to another. Using that definition, decide which (if any) of the 
linear transformations described in § 33, Ex. 3 are invertible. 


§ 37. Matrices 


Let us now pick up the loose threads; having introduced the new concept 
of linear transformation, we must now find out what it has to do with the 
old concepts of bases, linear functionals, etc. 

One of the most important tools in the study of linear transformations 
on finite-dimensional vector spaces is the concept of a matrix. Since this 
concept usually has no decent analogue in infinite-dimensional spaces, 
and since it is possible in most considerations to do without it, we shall try 
not to use it in proving theorems. It is, however, important to know what 
a matrix is; we enter now into the detailed discussion. 
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Derinition. Let U be an n-dimensional vector space, let X = {21, ---, 
Zn} be any basis of 0, and let A be a linear transformation on U. Since 
every vector is a linear combination of the x;, we have in particular 


Ag; = D; Qijti 
for j = 1, ---, n. The set (æj) of n? scalars, indexed with the double 
subscript t, j, is the matriz of A in the coordinate system X; we shall 
generally denote it by [A], or, if it becomes necessary to indicate the 
particular basis X under consideration, by [A; X]. A matrix (a:;) is 
usually written in the form of a square array: 


a&i 12 *°* Gin 


21 Q22 "°t Gen 


[fA4j=|- à- |; 


Onl Qn2 °** Onn 
the scalars (az, +++, ain) form a row, and (aij, +++, anj) a column, of [A]. 


This definition does not define “matrix” ; it defines “the matrix associated 
under certain conditions with a linear transformation.” It is often useful 
to consider a matrix as something existing in its own right as a square 
array of scalars; in general, however, a matrix in this book will be tied up 
with a linear transformation and a basis. 

We comment on notation. I$ is customary to use the same symbol, 
say, A, for the matrix as for the transformation. The justification for 
this is to be found in the discussion below (of properties of matrices). 
We do not follow this custom here, because one of our principal aims, in 
connection with matrices, is to emphasize that they depend on a coordinate 
system (whereas the notion of linear transformation does not), and to 
study how the relation between matrices and linear transformations changes 
as we pass from one coordinate system to another. 

We call attention also to a peculiarity of the indexing of the elements 
aij of a matrix [A]. A basis is a basis, and so far, although we usually 
indexed its elements with the first n positive integers, the order of the 
elements in it was entirely immaterial. It is customary, however, when 
Speaking of matrices, to refer to, say, the first row or the first column. 
This language is justified only if we think of the elements of the basis £ 
as arranged in a definite order. Since in the majority of our considerations 
the order of the rows and the columns of a matrix is as irrelevant as the 
order of the elements of a basis, we did not include this aspect of matrices 
in our definition. It is important, however, to realize that the appearance 
of the square array associated with [A] varies with the ordering of X. 
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Everything we shall say about matrices can, accordingly, be interpreted 
from two different points of view; either in strict accordance with the 
letter of our definition, or else following a modified definition which makes 
correspond a matrix (with ordered rows and columns) not merely to a 
linear transformation and a basis, but also to an ordering of the basis. 

One more word to those in the know. It is a perversity not of the author, 
but of nature, that makes us write 


Az; = yi QijTis 
instead of the more usual equation 

Ax; => yy QjjX;. 
The reason is that we want the formulas for matrix multiplication and for 
the application of matrices to numerical vectors (that is, vectors (&, ---, 
En) in C”) to appear normal, and somewhere in the process of passing from 
vectors to their coordinates the indices turn around. To state our rule 
explicitly: write Az; as a linear combination of z3, ---, £n, and write the 
coefficients so obtained as the j-th column of the matrix [A]. (The first 


index on a;; is always the row index; the second one, the column index.) 
For an example we consider the differentiation transformation D on 


the space @,, and the basis {x1, ---, tn} defined by z; = t71, i = 1, 
-++,”. What is the matrix of D in this basis? We have 

Day = 02; + Ore +--+ . Otra + 02, 

Dag = Ixy + Org +--+ O%n—1 + Ory 
a) Dag = 0x1 + 2r2 +--+ Otni + Ozn 


Din = Ox, + Org +--+ (n — l)n + Ozn, 


so that 
010... 0 0 
002... 0 0 
(2) [D] = 
000 ::- 0 n-i 
00 0 --- 0 0 


The unpleasant phenomenon of indices turning around is seen by comparing 
(1) and (2). 
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§ 38. Matrices of transformations 


There is now a certain amount of routine work to be done, most of which 
we shall leave to the imagination. The problem is this: in a fixed coordinate 
system X = {z,, ---, Zn}, knowing the matrices of A and B, how can we 
find the matrices of aA ++ BB, of AB, of 0, 1, etc.? 

Write [A] = (a;;), [B] = (6;;), C = aA + BB, [C] = (7:;); we assert that 


Yij = aai; + BBi;; 
also if [0] = (o;;) and {1] = (e;;), then 


Oij = 0 
and 
ei; = 3,; (= the Kronecker delta). 


A more complicated rule is the following: if C = AB, [C] = (y:;), then 
Yij = Doe aitbrj- 


To prove this we use the definition of the matrix associated with a trans- 
formation, and juggle, thus: 


Cr; = A(Bz;) = A( Doe Beste) = Doe Bete 
= Doe Beds aiat) = Ds (Doe abri 


The relation between transformations and matrices is exactly the same 
as the relation between vectors and their coordinates, and the analogue 
of the isomorphism theorem of § 9 is true in the best possible sense. We 
shall make these statements precise. 

With the aid of a fixed basis X, we have made correspond a matrix [A] 
to every linear transformation A; the correspondence is described by the 
relations Az; = >); at; We assert now that this correspondence is 
one-to-one (that is, that the matrices of two different transformations are 
different), and that every array (a;;) of n? scalars is the matrix of some 
transformation. To prove this, we observe in the first place that knowledge 
of the matrix of A completely determines A (that is, that Ax is thereby 
uniquely defined for every x), as follows: if x = J ;ġ;z; then Az = 
Dis Aa; = 5 (Dias) = Ds (Ly eee. (n other words, if 
y = Az = Jint then 

m = Diy ashy. 
Compare this with the comments in § 37 on the perversity of indices.) 
In the second place, there is no law against reading the relation Az; = 
>: a;jz; backwards. If, in other words, (a;;) is any array, we may use 
this relation to define a linear transformation A; it is clear that the matrix 
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of A will be exactly (a,;). (Once more, however, we emphasize the funda- 
mental fact that this one-to-one correspondence between transformations 
and matrices was set up by means of a particular coordinate system, and 
that, as we pass from one coordinate system to another, the same linear 
transformation may correspond to several matrices, and one matrix 
may be the correspondent of many linear transformations.) The follow- 
ing statement sums up the essential part of the preceding discussion. 


THEOREM. Among the set of all matrices (a;;), (Bij), etc., i,j = 1, +++, 
(not considered in relation to linear transformations), we define sum, scalar 
multiplication, product, (0,;), and (e;;), by 
(aij) + (Biz) = (a + bi), 
alaj) = (aajs), 
(ai) (Bi) = (Doe apri), 
Oj = 0, ĉij = by. 
Then the correspondence (established by means of an arbitrary coordinate 
system X = {21, +*+, Zn} of the n-dimensional vector space U), between 
all linear transformations A on U and all matrices (a,j), described by 
Ax; = Èi ajz; is an isomorphism; in other words, it is a one-to-one cor- 
respondence that preserves sum, scalar multiplication, product, 0, and 1. 
We have carefully avoided discussing the matrix of A~!. It is possible 


to give an expression for [A~'] in terms of the elements a;; of [A], but the 
expression is not simple and, fortunately, not useful for us. 


EXERCISES 


1. Let A be the linear transformation on @, defined by (Az)() = z(t + 1), and 
let {xo, -+-, tn} be the basis of ©, defined by z(t) = t, j = 0, --:,n—1. Find 
the matrix of A with respect to this basis. 

2. Find the matrix of the operation of conjugation on ©, considered as a real 
vector space, with respect to the basis {1,1} (where è = V —1). 

3. (a) Let m be a permutation of the integers 1, ---, n; if z = (£1, +++, £n) is 
a vector in ©”, write Az = (frm, +*+, Erw). If z: = (Sa, +*+, Sin), find the matrix 
of A with respect to {z1, ++, ta}. 

(b) Find all matrices that commute with the matrix of A. 


4. Consider the vector space consisting of all real two-by-two matrices and let A 
be the linear transformation on this space that sends each matrix X onto PX, where 


x aE 1 
P= G `) . Find the matrix of A with respect to the basis consisting of ( é 


God G od G 
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5. Consider the vector space consisting of all linear transformations on a vector 
space U, and let A be the (left) multiplication transformation that sends each trans- 
formation X on U onto PX, where P is some prescribed transformation on VU. 
Under what conditions on P is A invertible? 


6. Prove that if J, J, and K are the complex matrices 
0 1 0s + 0 
(i 0)’ (, o) (5 a) 
respectively (where i = V—1), then ? =J? = K? = —1, IJ = -JI =K, 
JK = —KJ = I, and KI = —IK = J. 


7. (a) Prove that if A, B, and C are linear transformations on a two-dimensional 
vector space, then (AB — BA)? commutes with C. 
(b) Is the conclusion of (a) true for higher-dimensional spaces? 


8. Let A be the linear transformation on @? defined by A($ı, $2) = ($1 + $z, 
£). Prove that if a linear transformation B commutes with A, then there exists 
a polynomial p such that B = p(A). 


9. For which of the following polynomials p and matrices A is it true that p(A) = 
0? 


111 
@ no=e-ae4a—1 4 =(1 1 1) 


00 1 
1 1 1 
(b) p(t) = Ë — 3t, A = ( 1 i) 
i ł 1 
1 10 
Om=e+e+itna=i 1 r) 
0141 
0 1 0 
(d) p(t) = È — 2t, A = ( 0 1) 
010 
10. Prove that if A and’B are the complex matrices 
0100 + 0 00 
0010 ind 0—1 00 
0001 0 oO -% 0 
1000 0 0 01 


respectively (where 1 = V —1), and if C = AB — iBA, then C?+0?+0C =0. 


11. If A and B are linear transformations on a vector space, and if AB = 0, 
does it follow that BA = 0? 


12. What happens to the matrix of a linear transformation on a finite-dimensional 
vector space when the elements of the basis with respect to which the matrix ie 
computed are permuted among themselves? 


70 TRANSFORMATIONS Sec. 38 


13. (a) Suppose that U is a finite-dimensional vector space with basis {z1, ---, 
zn}. Suppose that ay, «++, Œn are pairwise distinct scalars. If A is a linear trans- 
formation such that Az; = a;z;, j = 1, +-+, n, and if B is a linear transformation 
that commutes with A, then there exist scalars 8), ---, Bn such that Bz; = B,2;. 

(b) Prove that if B is a linear transformation on a finite-dimensional vector 
space U and if B commutes with every linear transformation on U, then B is a 
scalar (that is, there exists a scalar 8 such that Br = Bx for all z in U). 


14. If {21, +++, te} and {y1, ---, ye} are linearly independent sets of vectors in 
a finite-dimensional vector space U, then there exists an invertible linear trans- 
formation A on U such that Az; = y;,j = 1, 7, k. 


15. If a matrix [A] = (æy) is such that a, = 0,4 = 1, ---, n, then there exist 
matrices [B] = (By) and [C] = (yy) such that [A] = [B][C] — [C][B]. (Hint: try 
Biz = Bòi.) 


16. Decide which of the following matrices are invertible and find the inverses 
of the ones that are. 
0 
(e) ( 
1 


1 1 1 
(a) e i) 0 
0 
1 1 
(b) h vy 101 
01 (£) (: 0 i) 
(c) (o 0): 101 
0 1 
0 i 
oli TE 
0 1 
17. For which values of æ are the following matrices invertible? Find the in- 
verses whenever possible. 
l a 
(0) G a): 


© o) 
o (i o @( 4) 


18. For which values of a are the following matrices invertible? Find the in- 
verses whenever possible. 


1a 0 0 le 
(a) (= 1 a) (c) (: a . 
Oai a 0 
a 1 0 1 i 
(b) (: a ij a(i 1 
Ole l a 
19. (a) It is easy to extend matrix theory to linear transformations between 


different vector spaces. Suppose that U and V are vector spaces over the same 
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field, let {x1, ---, £n} and {y1, ---, Ym} be bases of U and VU respectively, and 
let A be a linear transformation from U to U. The matrix of A is, by definition, 
the rectangular, m by n, array of scalars defined by 

tj; = D; Qiii. 
Define addition and multiplication of rectangular matrices so as to generalize as 
many as possible of the results of §38. (Note that the product of an m by nı 
matrix and an m: by ng matrix, in that order, will be defined only if nı = me.) 

(b) Suppose that A and B are multipliable matrices. Partition A into four 
rectangular blocks (top left, top right, bottom left, bottom right) and then partition 
B similarly so that the number of columns in the top left part of A is the same as 
the number of rows in the top left part of B. If, in an obvious shorthand, these 
partitions are indicated by 


An Aw By Bi 
a= Ge aa) B = Ba ie 
then 
AB = Gre +AnBu AuBiz + ue) . 
AuBu + Ax2Ba AoBi2 + AnBoo 


(c) Use subspaces and complements to express the result of (b) in terms of 
linear transformations (instead of matrices). 
(d) Generalize both (b) and (c) to larger numbers of pieces (instead of four). 


§ 39. Invariance 


A possible relation between subspaces M of a vector space and linear 
transformations A on that space is invariance. We say that W is invariant 
under A if x in W implies that Az isin M. (Observe that the implication 
relation is required in one direction only; we do not assume that every 
y in M can be written in the form y = Az with z in M; we do not even 
assume that Az in M implies x in M. Presently we shall see examples in 
which the conditions we did not assume definitely fail to hold.) We 
know that a subspace of a vector space is itself a vector space; if we know 
that M is invariant under A, we may ignore the fact that A is defined 
outside 9% and we may consider A as a linear transformation defined on 
the vector space M. Invariance is often considered for sets of linear 
transformations, as well as for a single one; W is invariant under a set if 
it is invariant under each member of the set. 

What can be said about the matrix of a linear transformation A on an 
n-dimensional vector space V if we know that some M is invariant under 
A? In other words: is there a clever way of selecting a basis X = {21, >, 
Ta} in V so that [A] = [A; x] will have some particularly simple form? 
The answer is in § 12, Theorem 2; we may choose X so that Ti, °°", Im 
are in M and %m41i,‘**,tnarenot. Let us express Az; in terms of 21,--+, £n. 
For m + 1 £ j £ n, there is not much we can say: Az; = >: aijt; For 
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1s j < m, however, x; is in M, and therefore (since M is invariant under 
A )Az; is in M. Consequently, in this case Ax; is a linear combination of 
Zi, °**, Zm; the a; with m + 1S t < n are zero. Hence the matrix [A] 
of A, in this coordinate system, will have the form 


[Ail [Bol 
saa te a 


where [A;] is the (m-rowed) matrix of A considered as a linear transforma- 
tion on the space M (with respect to the coordinate system {2;, ---, 2m}), 
[42] and [Bo] are some arrays of scalars (in size (n — m) by (n — m) and 
m by (n — m) respectively), and [0] denotes the rectangular ((n — m) by m) 
array consisting of zeros only. (It is important to observe the unpleasant 
fact that [Bo] need not be zero.) 


§ 40. Reducibility 


A particularly important subcase of the notion of invariance is that of 
reducibility. If M and N are two subspaces such that both are invariant 
under A and such that U is their direct sum, then A is reduced (decomposed) 
by the pair (Mm, N). The difference between invariance and reducibility 
is that, in the former case, among the collection of all subspaces invariant 
under A we may not be able to pick out any two, other than 0 and 0, with 
the property that U is their direct sum. Or, saying it the other way, if 
W is invariant under A, there are, to be sure, many ways of finding an ` 
N such that © = M @ N, but it may happen that no such 9 will be in- 
variant under A. 

The process described above may also be turned around. Let M and. 
N be any two vector spaces, and let A and B be any two linear transforma- 
tions (on M and N respectively). Let U be the direct sum M @ N; we ` 
may define on V a linear transformation C called the direct sum of A and - 


B, by writin 
si Cz = C(e, y) = (Az, By). 


We shall omit the detailed discussion of direct sums of transformations; 
we shall merely mention the results. Their proofs are easy. If (I, N) 
reduces C, and if we denote by A the linear transformation C considered 
on M alone, and by B the linear transformation C considered on N alone, 
then C is the direct sum of A and B. By suitable choice of basis (namely, 
by choosing T1, +--+, tm in M and tm41, °**, Zn in N) we may put the 
matrix of the direct sum of A and B in the form displayed in the preceding 
section, with [A,] = [A], [Bo] = [0], and [42] = [B]. If p is any poly- 
nomial, and if we write A’ = p(A), B’ = p(B), then the direct sum C’ 
of A’ and B’ will be p(C). 
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EXERCISES 
1. Suppose that the matrix of a linear transformation (on a two-dimensiona 


vector space) with respect to some coordinate system is G °) . How many sub- 
spaces are there invariant under the transformation? 


2. Give an example of a linear transformation A on a finite-dimensional vector 
space U such that © and Ù are the only subspaces invariant under A. 


3. Let D be the differentiation operator on @,. If m < n, then the subspace 
Ọm is invariant under D. Is D on Ọm invertible? Is there a complement of On 
in @, such that it together with Cm reduces D? 


4. Prove that the subspace spanned by two subspaces, each of which is invariant 
under some linear transformation A, is itself invariant under A. 


§ 41. Projections 


Especially important for our purposes is another connection between 
direct sums and linear transformations. 


Dertnirton. If U is the direct sum of M and K, so that every z in U 
may be written, uniquely, in the form z = x + y, with x in M and y 
in N, the projection on M along Ñ is the transformation E defined by 
Ez = z. 


If direct sums are important, then projections are also, since, as we shall 
see, they are a very powerful algebraic tool in studying the geometric 
concept of direct sum. The reader will easily satisfy himself about the 
reason for the word “projection” by drawing a pair of axes (linear manifolds) 
in the plane (their direct sum). To make the picture look general enough, 
do not draw perpendicular axes! 

We skipped over one point whose proof is easy enough to skip over, but 
whose existence should be recognized; it must be shown that E is a linear 
transformation. We leave this verification to the reader, and go oa to 
look for special properties of projections. 


THEOREM 1. A linear transformation E is a projection on some subspace 
if and only if it is idempotent, that is, E? = E. 
PROOF. If Eis the projection on M along X, and if z = z + y, with 
z in W and y in N, then the decomposition of z is z + 0, so that 
E?z = EEz = Ex = z = Ez. 


, Conversely, suppose that E? = E. Let 9 be the set of all vectors z 
in v for which Ez = 0; let m be the set of all vectors z for which Ez = z. 
It is clear that both M and X are subspaces; we shall prove that U = 9K 
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N. In view of the theorem of § 18, we need to prove that M and N 
are disjoint and that together they span U. 

If z is in M, then Ez = z; if z is in 91, then Ez = 0; hence if z is in both 
M and N, then z = 0. For an arbitrary z we have 


z= Ez + (1 — Bz. 


If we write Ez = z and (1 — E)z = y, then Ex = E’z = Ez = 2, and 
Ey = E(1 — E)z = Ez — E’z = 0, so that z is in M and y isin N. This 
proves that U = M @ NR, and that the projection on M along N is precisely 
E. 

As an immediate consequence of the above proof we obtain also the 
following result. 


THEOREM 2. If E is the projection on W along R, then M and N are, 
respectively, the sets of all solutions of the equations Ez = z and Ez = 0. 


By means of these two theorems we can remove the apparent asymmetry, 
in the definition of projections, between the roles played by M and X. 
If to every z = x + y we make correspond not x but y, we also get an 
idempotent linear transformation. This transformation (namely, 1 — E) 
is the projection on N along m. We sum up the facts as follows. 


THEOREM 3. A linear transformation E is a projection if and only if 
1 — E is a projection; if E is the projection on M along R, then 1 — E 
is the projection on R along M. 


§ 42. Combinations of projections 


Continuing in the spirit of Theorem 3 of the preceding section, we in- 
vestigate conditions under which various algebraic combinations of projec- 
tions are themselves projections. 


THEOREM. We assume that E, and Ez are projections on Mı and Me 
along Ny and Ne respectively and that the underlying field of scalars ts 
such that 1 + 1 = 0. We make three assertions. 

(i) Ey + Ez is a projection if and only if E,Ez = E2H, = 0; if this 
condition is satisfied, then E = E, + Ez is the projection on M along Ñ, 
where M = IM, @ Me and N = Na N No. 

(ii) E, — Ez is a projection if and only if EE = EE, = E2; af 
this condition is satisfied, then E = E, — Ez ts the projection on M along 
N, where M = Mı N Ne and N = Ty @ Mə. 

(iii) If E,E, = EE, = E, then E is the projection on W along R, where 
M = Mı N Maand N = Ny + Ne. 
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PROOF. We recall the notation. If 3 and X are subspaces, then 3e + K 
is the subspace spanned by 3€ and X; writing 3¢C @ K implies that 3¢ and 
XK are disjoint, and then 3€ @ K = 3 + XK; and 3 N XK is the intersection 
of X£ and XK. 

(i) If E, + E = E is a projection, then (E,; + EY = F? = E = E; 
+ Ez, so that the cross-product terms must disappear: 


(1) E,E, + Eek, = 0. 

If we multiply (1) on both left and right by E, we obtain 
EE, + EEE = 0, 
EEE, + EE = 0; 


subtracting, we get EE — EE, = 0. Hence E; and Es are commutative, 
and (1) implies that their product is zero. (Here is where we need the 
assumption 1+ 1 0.) Since, conversely, EE = EE, = 0 clearly 
implies (1), we see that the condition is also sufficient to ensure that E 
be a projection. 

Let us suppose, from now on, that E is a projection; by § 41, Theorem 2, 
M and N are, respectively, the sets of all solutions of the equations Ez = z 
and Ez = 0. Let us write z = zı + yı = z2 + ye, where zı = Eiz and 
Lo = Ez are in M, and We, respectively, and y, = (1 — E,)z and yz = 
(1 — Ez)z are in X and Ne, respectively. If zis in M, Ez + Ezz = z, then 


z = Ei(z2 + yo) + Foti + y1) = Ery + Exi. 


Since E, (E1y2) = Eyye and E2(E2y,) = Ezy1, we have exhibited z as a sum 
of a vector from M; and a vector from Ma3, so that M C Mı + We. Con- 
versely, if z is a sum of a vector from M, and a vector from Mə, then 
(EZ, + E2)z = z, so that z is in M, and consequently M = Mı + Me. 
Finally, if z belongs to both M, and Mz, so that Ez = Ezz = z, then 
z = Ez = E (Ez) = 0, so that M, and Mg are disjoint; we have proved 
that M = 9% D Me. 

It remains to find X, that is, to find all solutions of Ez + Eaz = 0. If 
zis in 9; N Nz, this equation is clearly satisfied; conversely Ez + Ezz = 0 
implies (upon multiplication on the left by #, and Ez respectively) that 
Eiz + E,E az = 0 and E,F\z+ Ezz = 0. Since E,Egz = E2E\z = 0 for 
all z, we obtain finally yz = Eaz = 0, so that z belongs to both 9t; and Mp. 

With the technique and the results obtained in this proof, the proofs of 
the remaining parts of the theorem are easy. 

: (ii) According to § 41, Theorem 3, E, — E; is a projection if and only 
if 1 — (E, — Ey) = (1 — E) + Ez is a projection. According to (i) 
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this happens (since, of course, 1 — E, is the projection on N along M) 
if and only if 


(2) (1 — £,)E, = E,(1 — £,) = 0, 


and in this case (1 — E) + Es is the projection on 9; @ We along Mı 
N Nə. Since (2) is equivalent to EE = EE, = Ez, the proof of (ii) 
is complete. 

(iii) That E = E,E = EE, implies that E is a projection is clear, 
since E is idempotent. We assume, therefore, that E, and Ez} commute 
and we find M and N. If Ez = z, then Ez = E,Ez = E,E, Eee = E,E 22 
= z, and similarly Ez = z, so that z is contained in both M, and Mə. 
The converse is clear; if Eiz = z = Egz, then Ez = z. Suppose next that 
E Eaz = 0; it follows that Eaz belongs to N4, and, from the commutativity 
of E, and Ez, that Ez belongs to Nə. This is more symmetry than we 
need; since z = Eaz + (1 — E2)z, and since (1 — £2)z is in Nz, we have 
exhibited z as a sum of a vector from N and a vector from Na. Conversely 
if z is such a sum, then E,Eaz = 0; this concludes the proof that N = 9; 
+ Ne. 

We shall return to theorems of this type later, and we shall obtain, in 
certain cases, more precise results. Before leaving the subject, however, 
we call attention to a few minor peculiarities of the theorem of this section. 
We observe first that although in both (i) and (ii) one of M and N was a 
direct sum of the given subspaces, in (iii) we stated only that N = Ni + Noe. 
Consideration of the possibility E, = Ez = E shows that this is unavoid- 
able. Also: the condition of (ili) was asserted to be sufficient only; it is 
possible to construct projections E, and Æ whose product £,E2 is a projec- 
tion, but for which EEs and EE, are different. Finally, it may be con- 
jectured that it is possible to extend the result of (i), by induction, to more 
than two summands. Although this is true, it is surprisingly non-trivial; 
we shall prove it later in a special case of interest. 


§ 43. Projections and invariance 


We have already seen that the study of projections is equivalent to the 
study of direct sum decompositions. By means of projections we may also 
study the notions of invariance and reducibility. 


THEOREM 1. If a subspace M is invariant under the linear transformation 
A, then EAE = AE for every projection E on m. Conversely, if EAE 
= AE for some projection E on M, then M is invariant under A. 


PROOF. Suppose that m is invariant under A and that ÙU = M @ MN 
for some N; let E be the projection on M along N. For any z = x + y 
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(with z in M and y in N) we have AEz = Az and EAEz = EAr; since the 
presence of x in M guarantees the presence of Ax in M, it follows that 
EAz is also equal to Az, as desired. 

Conversely, suppose that U = M @ N, and that EAE = AF for the 
projection E on M along N. If x isin M, then Ex = z, so that 


EAz = EAEx = AEt = Ax, 
and consequently Az is also in M. 


THEOREM 2. IfM and N are subspaces with U = M @ N, then a neces- 
sary and sufficient condition that the linear transformation A be reduced 
by the pair (M, N) is that EA = AE, where E is the projection on Im 
along Nt. 


PROOF. First we assume that EA = AE, and we prove that A is reduced 
by (m, N). If x isin M, then Az = AEx = EAr, so that Az is also in 
M; if x is in KR, then Er = 0 and EAz = AEr = AO = 0, so that Az is 
also in 9. 

Next we assume that A is reduced by (SI, N), and we prove that FA 
= AE. Since M is invariant under A, Theorem 1 assures us that FAE 
= AE; since % is also invariant under A, and since 1 — E is a projection 
on N, we have, similarly, (1 — F)A(i — E) = A(1 — E). From the 
second equation, after carrying out the indicated multiplications and 
simplifying, we obtain HAE = EA; this concludes the proof of the theorem. 


EXERCISES 


1. (a) Suppose that E is a projection on a vector space U, and suppose that 
scalar multiplication is redefined so that the new product of a scalar œ and a vector 
z is the old product of œ and Ez. Show that vector addition (old) and scalar mul- 
tiplication (new) satisfy all the axioms on a vector space except 1-2 = z. 

(b) To what extent is it true that the method described in (a) is the only way to 
construct systems satisfying all the axioms on a vector space except L-z = 2? 

2. (a) Suppose that U is a vector space, xo is a vector in U, and yo is a linear 
functional on U; write Ax = [z, yolto for every z in U. Under what conditions 
on zo and yo is A a projection? 

(b) If A is the projection on, say, M along N, characterize M and X in terms 
of zo and yo. 


3. If A is left multiplication by P on a space of linear transformations (cf. § 38 
Ex. 5), under what conditions on P is A a projection? 


4. If A is a linear transformation, if E is a projection, and if F = 1 — E, then 
A = EAE + EAF + FAE + FAF. 


Use this result to prove the multiplication rule for partitioned (square) matrices 
(as in § 38, Ex. 19). 
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5. (a) If E, and Es are projections on Wi and Mz along Nı and Nz respectively, 
and if E, and Ez commute, then E, + E: — EE: is a projection. 

(b) If #, + Ez — EE: is the projection on SW along N, describe M and N in 
terms of Mi, Mz, Na, and MWe. 

6. (a) Find a linear transformation A such that A?(1 — A) = 0 but A is not 
idempotent. 

(b) Find a linear transformation A such that A(1 — A)? = 0 but A is not 
idempotent. 

(c) Prove that if A is a linear transformation such that A?(1 — A) = A(I — A)? 
= 0, then A is idempotent. 

7. (a) Prove that if E is a projection on a finite-dimensional vector space, then 
there exists a basis X such that the matrix (e;;) of # with respect to X has the fol- 
lowing special form: e;; = 0 or 1 for alls and j, and e; = O if t = j. 

(b) An involution is a linear transformation U such that U? = 1. Show that 
if 1 + 1 ¥ 0, then the equation U = 2E — 1 establishes a one-to-one correspond- 
ence between all projections Æ and all involutions U. 

(c) What do (a) and (b) imply about the matrix of an involution on a finite- 
dimensional vector space? 

8. (a) In the space C? of all vectors (&, £2) let MH, Nai, and Na be the subspaces 
characterized by & = &, £ = 0, and & = 0, respectively. If Hi and Ez are the 
projections on M+ along Nı and Nz respectively, show that EEs = Ez and Eek; 


le 
(b) Let M- be the subspace characterized by & = —é. If Eo is the projection 
on Nz along M-, then H2H is a projection, but EEs is not. 


9. Show that if E, F, and G are projections on a vector space over a field whose 
characteristic is not equal to 2, and if E -+ F + G = 1, then EF = FE = EG 
= GE = FG = GF = 0. Does the proof work for four projections instead of three? 


§ 44. Adjoints 


Let us study next the relation between the notions of linear transforma- 
tion and dual space. Let U be any vector space and let y be any element 
of V’; for any linear transformation A on U we consider the expression 
{Az, y]. For each fixed y, the function y’ defined by y’(x) = [Az, y] is 
a linear functional on U; using the square bracket notation for y’ as well 
as for y, we have [Az, y] = [z, y’]. If now we allow y to vary over VU’, 
then this procedure makes correspond to each y a y’, depending, of course, 
on y; we write y’ = A’y. The defining property of A’ is 


(1) [Ax, y] = [2, A’y]. 
We assert that A’ is a linear transformation on U’. Indeed, if y = a1y1 
+ oaezye, then 
[x, A’y] = [Az, y] = a [Az, yı] + aláz, ya] 
= alz, A'yı] + ælz, A’y2] = [£, a, A'yı + o2A’ya). 
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The linear transformation A’ is called the adjoint (or dual) of A; we dedicate 
this section and the next to studying properties of A’. Let us first get the 
formal algebraic rules out of the way; they go as follows. 


(2) 0’ = 0, 

(3) “=l, 

(4) (A+B = A’ +P', 
(5) (eA)! = aA’, 

(6) (AB) = B’A’, 
(7) (Ant = (497. 


Here (7) is to be interpreted in the following sense: if A is invertible, 
then so is A’, and the equation is valid. The proofs of all these relations 
are elementary; to indicate the procedure, we carry out the computations 
for (6) and (7). To prove (6), merely observe that 


[ABz, y] = [Bx, A’y] = [z, B’A’y]. 


To prove (7), suppose that A is invertible, so that AA~! = A'A = 1, 
Applying (3) and (6) to these equations, we obtain 


(AA! A A'(A7) = l; 


Theorem 1 of § 36 implies that A’ is invertible and that (7) is valid. 
In finite-dimensional spaces another important relation holds: 


(8) AY =A, 


This relation has to be read with a grain of salt. As it stands A” is a trans- 
formation not on Y but on the dual space V” of U’. If, however, we identify 
V” and VU according to the natural isomorphism, then A” acts on U and 
(8) makes sense. In this interpretation the proof of (8) is trivial. Since 
V is reflexive, we obtain every linear functional on U’ by considering 
[z, y] as a function of y, with z fixed in U. Since [z, A’y] defines a function 
(a linear functional) of y, it may be written in the form [z’, y]. The vector 
xz’ here is, by definition, A’’x. Hence we have, for every y in V’ and for 
every x in U, 
[Az, y) = [z, A'y] = [A”z, y]; 


the equality of the first and last terms of this chain proves (8). 

Under the hypothesis of (8) (that is, finite-dimensionality), the asym- 
metry in the interpretation of (7) may be removed; we assert that in this 
case the invertibility of A’ implies that of A and, therefore, the validity 


80 TRANSFORMATIONS Sro. 45 


of (7). Proof: apply the old interpretation of (7) to A’ and A” in place 
of A and A’, 

Our discussion is summed up, in the reflexive finite-dimensional case, 
by the assertion that the mapping A — A’ is one-to-one, and, in fact, an 
algebraic anti-isomorphism, from the set of all linear transformations on 
onto the set of all linear transformations on U’. (The prefix “anti” got 
attached because of the commutation rule (6).) 


§ 45. Adjoints of projections 


There is one important case in which multiplication does not get turned 
around, that is, when (AB)' = A'B’; namely, the case when A and B 
commute. We have, in particular, (A")’ = (A’)", and, more generally, 
(p(A))’ = p(A’) for every polynomial p. It follows from this that if E 
is a projection, then so is E’. The question arises: what direct sum de- 
composition is F’ associated with? 


THEOREM 1. If E is the projection on M along I, then E’ is the projection 
on R? along MÌ. 


PROOF. We know already that (F)? = EB’ and V’ = n° @ mM (ef. 
§ 20). Itis necessary only to find the subspaces consisting of the solutions 
of E’y = 0 and E’y = y. This we do in four steps. 

(i) If y is in MÌ, then, for all z, 

[z, E'y] = [Ex, y] = 0, 

so that E’y = 0. 

Gi) If E’y = 0, then, for all z in ©, 

[c, y] = [Ezx, y] = [x, By] = 0, 

so that y is in mM’. 

(ii) If y is in NÌ, then, for all z, 


[z, y] = [Ex, y] +10. — E)r, y) = lEz, y] = [z, By], 


so that H’y = y. 
(iv) If E’y = y, then for all z in %, 


[z, y] = [z, Ey} = [Ez, y) =, 0, 
so that y is in 90°. 
Steps (i) and (ii) together show that the set of solutions of E’y = 0 
is precisely MÌ; steps (iii) and (iv) together show that the set of solutions 
of E’y = y is precisely 2°. This concludes the proof of the theorem. 


THEOREM 2. If m is invariant under A, then M? is sndariont under 
A’; if A is reduced by (W, N), then A’ is reduced by (91°, N). 
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PROOF. We shall prove only the first statement; the second one clearly 
follows from it. We first observe the following identity, valid for any 
three linear transformations Æ, F, and A, subject to the relation F = 1 — E: 


(1) FAF — FA = EAE — AE. 


(Compare this with the proof of § 43, Theorem 2.) Let E be any projection 
on M; by § 43, Theorem 1, the right member of (1) vanishes, and, therefore, 
so does the left member. By taking adjoints, we obtain F’A’F’ = A’F’; 
since, by Theorem 1 of the present section, F” = 1 — F’ is a projection on 
MÌ, the proof of Theorem 2 is complete. (Here is an alternative proof of 
the first statement of Theorem 2, a proof that does not make use of the 
fact that V is the direct sum of M and some other subspace. If y is in 
M, then [x, A’y] = [Az, y] = 0 for all x in M, and therefore A’y is in 97°. 
The only advantage of the algebraic proof given above over this simple 
geometric proof is that the former prepares the ground for future work 
with projections.) 

We conclude our treatment of adjoints by discussing their matrices; 
this discussion is intended to illuminate the entire theory and to enable 
the reader to construct many examples. 

We shall need the following fact: if X = {2, ---,2,} is any basis in the 
n-dimensional vector space V, if X' = {y,, ---, Yn} is the dual basis in 
vu’, and if the matrix of the linear transformation A in the coordinate 
system X is (a;;), then 


(2) aij = [Azx;, yd). 


This follows from the definition of the matrix of a linear transformation; 
since Az; = > arj£k, We have 


(Ax; yl = Er ordlze, yl = aij. 


To keep things straight in the applications, we rephrase formula (2) 
verbally, thus: to find the (i, j) element of [A] in the basis X, apply A to 
the j-th element of 9X and then take the value of the i-th linear functional 
(in X’) at the vector so obtained. 

It is now very easy to find the matrix (a’;;) = [A’] in the coordinate 
system X’; we merely follow the recipe just given. In other words, we 
consider A’y;, and take the value of the 7-th linear functional in X” (that 
is, of x; considered as a linear functional on X’) at this vector; the result is 
that 

a's; = le; A’y;]. 
Since [z;, Ay) = [Az,, yj] = aj, so that aj; = aj; this matrix [A’] is 
called the transpose of [A]. 
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Observe that our results on the relation between E and E’ (where E 
is a projection) could also have been derived by using the facts about the 
matricial representation of a projection together with the present result 
on the matrices of adjoint transformations. 


§ 46. Change of basis 


Although what we have been doing with linear transformations so far 
may have been complicated, it was to a large extent automatic. Having 
introduced the new concept of linear transformation, we merely let some 
of the preceding concepts suggest ways in which they are connected with 
linear transformations. We now begin the proper study of linear trans- 
formations. As a first application of the theory we shall solve the problems 
arising from a change of basis. These problems can be formulated without 
mentioning linear transformations, but their solution is most effectively 
given in terms of linear transformations. 

Let V be an n-dimensional vector space and let X = {z1, --+, £n} and 
y = {y1, ***, yn} be two bases in U. We may ask the following two ques- 
tions. 

Question I. If x is in V, 2 = Di tizi = Dindi, what is the relation 

between its coordinates (i, ---, En) with respect to X and its coordinates 

(ni, ***, nn) with respect to Y? 

Question II. If (tı, +++, En) is an ordered set of n scalars, what is the 

relation between the vectors x = >: tiz; and y = > Ey? 

Both these questions are easily answered in the language of linear 
transformations. We consider, namely, the linear transformation A defined 
by Az; = y; i = 1,---, n. More explicitly: 


A(t) = Dis bye. 
Let (a;;) be the matrix of A in the basis X, that is, y; = Az; = Dos ost. 
We observe that A is invertible, since >: éy; = 0 implies that & = & 
Ta TO ae 1. Since 
Do mys = Dos mAr = Dn Dis aiti 
= Di (Qos oss ms) 2s, 
we have 
(1) Bs = Dos aijn. 
ANSWER TO QUESTION II. 
(2) y = Az. 
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Roughly speaking, the invertible linear transformation A (or, more 
properly, the matrix (a;;)) may be considered as a transformation of 
coordinates (as in (1)), or it may be considered (as we usually consider it, 
in (2)) as a transformation of vectors. 

In classical treatises on vector spaces it is customary to treat vectors 
as numerical n-tuples, rather than as abstract entities; this necessitates 
the introduction of some cumbersome terminology. We give here a brief 
glossary of some of the more baffling terms and notations that arise in con- 
nection with dual spaces and adjoint transformations. 

If U is an n-dimensional vector space, a vector x is given by its co- 
ordinates with respect to some preferred, absolute coordinate system; 
these coordinates form an ordered set of n scalars. It is customary to 
write this set of scalars in a column, 


a 


Èn 


Elements of the dual space V’ are written as rows, 2’ = (£1, ---, E'n). 
If we think of z as a (rectangular) n-by-one matrix, and of x’ as a one-by-n 
matrix, then the matrix product x'x is a one-by-one matrix, that is, a 
scalar. In our notation this scalar is [z, x’] = &£’1 +°+++ &né’n. The trick 
of considering vectors as thin matrices works even when we consider the 
full-grown matrices of linear transformations. Thus the matrix product of 
(a;;) with the column (¢;) is the column whose i-th element is n; = >; œit; 
Instead of worrying about dual bases and adjoint transformations, we 
may form similarly the product of the row (¢’;) with the matrix (a,;) in 
the order (¢';) (a;;) ; the result is the row that we earlier denoted by y’ = A’z’. 
The expression [Az, z’] is now abbreviated as z’-A-x; both dots denote 
ordinary matrix multiplication. The vectors z in V are called covariant and 
the vectors x’ in V’ are called contravariant. Since the notion of the product 
x'-x (that is, [x, x’]) depends, from this point of view, on the coordinates of 
z and x’, it becomes relevant to ask the following question: if we change 
basis in U, in accordance with the invertible linear transformation A, what 
must we do in V’ to preserve the product x’-x? In our notation: if [z, z’) 
= [y, y'], where y = Az, then how is y’ related to z’? Answer: y’ 
= (A’)~'z'. To express this whole tangle of ideas the classical terminology 
says that the vectors x vary cogrediently whereas the x’ vary contragrediently. 
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§ 47. Similarity 


The following two questions are closely related to those of the preceding 
section. 


Question III. If B is a linear transformation on U, what ts the relation 
between its matrix (8;;) with respect to X and tts matrix (ij) with respect 
to Y? 

Question IV. If (6,;) is a matrix, what is the relation between the linear 
transformations B and C defined, respectively, by Bx; = Dd: bizzi and 
Cy; = Dis Biu? 


Questions III and IV are explicit formulations of a problem we raised 
before: to one transformation there correspond (in different coordinate 
systems) many matrices (question IIT) and to one matrix there correspond 
many transformations (question IV). 

ANSWER TO QUESTION II. We have 


(1) Ba; = Doi biti 
and 
(2) By; = yi Visyi- 


Using the linear transformation A defined in the preceding section, we 
may write 


(3) By; = BAz; = BC Akik) 
= Dp onjBar = Dor orj Doi Birti = Doi (Doe Bikari) es, 


and 
(4) Die vee = Doe reste = Doe ves Des dikti 
= Jo; (Doe ove res) 2s. 
Comparing (2), (8), and (4), we see that 
Doe aiye = Dow Pirar. 
Using matrix multiplication, we write this in the dangerously simple form 
(5) [A][C] = [BIA]. 


The danger lies in the fact that three of the four matrices in (5) correspond 
to their linear transformations in the basis X; the fourth one—namely, 
the one we denoted by [C]—corresponds to B in the basis Y. With this 
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understanding, however, (5) is correct. A more usual form of (5), adapted, 
in principle, to computing [C] when [A] and [B] are known, is 


(6) [C] = [A] [BIA]. 


ANSWER TO QUESTION Iv. To bring out the essentially geometric character 
of this question and its answer, we observe that 


Cy; = CA; 
Di biyi = Dos Big Aas = ACD: Bizi) = ABz;. 
Hence C is such that 


and 


CAz; = ABz;, 
or, finally, 


(7) C = ABA™. 


There is no trouble with (7) similar to the one that caused us to make a 
reservation about the interpretation of (6); to find the linear transformation 
(not matrix) C, we multiply the transformations A, B, and A, and noth- 
ing needs to be said about coordinate systems. Compare, however, the 
formulas (6) and (7), and observe once more the innate perversity of 
mathematical symbols. This is merely another aspect of the facts already 
noted in §§ 37 and 38. 

Two matrices [B] and [C] are called similar if there exists an invertible 
matrix [A] satisfying (6); two linear transformations B and C are called 
similar if there exists an invertible transformation A satisfying (7). In 
this language the answers to questions III and IV can be expressed very 
briefly; in both cases the answer is that the given matrices or transforma- 
tions must be similar. 

Having obtained the answer to question IV, we see now that there are 
too many subscripts in its formulation. The validity of (7) is a geometric 
fact quite independent of linearity, finite-dimensionality, or any other 
accidental property that A, B, and C may possess; the answer to question 
IV is also the answer to a much more general question. This geometric 
question, a paraphrase of the analytic formulation of question IV, is this: 
If B transforms U, and if C transforms AV the same way, what is the 
relation between B and C? The expression “‘the same way” is not so vague 
as it sounds; it means that if B takes x into, say, u, then C takes Ax into 
Au. The answer is, of course, the same as before: since Bx = u and 
Cy = v (where y = Az and v = Au), we have 


ABr = Au =v = Cy = CAz. 
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The situation is conveniently summed up in the following mnemonic 
diagram: 
B 
<1—>u 


al |a 


y= >v 


We may go from y to v by using the short cut C, or by going around 
the block; in other words C = ABA~}, Remember that ABA™ is to 
be applied to y from right to left: first A7, then B, then A. 

We have seen that the theory of changing bases is coextensive with the 
theory of invertible linear transformations. An invertible linear trans- 
formation is an automorphism, where by an automorphism we mean an 
isomorphism of a vector space with itself. (See §9.) We observe that, 
conversely, every automorphism is an invertible linear transformation. 

We hope that the relation between linear transformations and matrices 
is by now sufficiently clear that the reader will not object if in the sequel, 
when we wish to give examples of linear transformations with various 
properties, we content ourselves with writing down a matrix. The in- 
terpretation always to be placed on this procedure is that we have in mind 
the concrete vector space C” (or one of its generalized versions $”) and the 
concrete basis X = {21, +*+, tn} defined by z; = (ôi, ***, din). With 
this understanding, a matrix (a;;) defines, of course, a unique linear trans- 
formation A, given by the usual formula A (So: tar) = X: (o> j gb) i. 


EXERCISES 


1. If A is a linear transformation from a vector space U to a vector space U, 
then corresponding to each fixed y in U’ there exists a vector, which might as well 
be denoted by A’y, in U’ so that 


[Az, y) = [z, A’y] 


for all zinu. Prove that A’ is a linear transformation from U’ to W’. (The trans- 
formation A’ is called the adjoint of A.) Interpret and prove as many as possible 
among the equations § 44, (2)-(8) for this concept of adjoint. 


2. (a) Prove that similarity of linear transformations on a vector space is an 
equivalence relation (that is, it is reflexive, symmetric, and transitive). 

(b) If A is similar to a scalar a, then A = a. 

(c) If A and B are similar, then so also are A? and B?, A’ and B’, and, in case 
A and B are invertible, A~! and Bo. 

(d) Generalize the concept of similarity to two transformations defined on dif- 
ferent vector spaces. Which of the preceding results remain valid for the gener- 
alized concept? 
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3. (a) If A and B are linear transformations on the same vector space and if at 
least one of them is invertible, then AB and BA are similar. 
(b) Does the conclusion of (a) remain valid if neither A nor B is invertible? 


4. If the matrix of a linear transformation A on @?, with respect to the basis 
{(1, 0), (0, 1)} is (i i): what is the matrix of A with respect to the basis {(1, 1) 


(1, —1)}? What about the basis {(1, 0), (1, 1)}? 
5. If the matrix of a linear transformation A on C@’, with respect to the basis 


1 1 
{(1, 0, 0), (0, 1, 0), (0, 0, 1)} is ( 1 0 -1) , what is the matrix of A with re- 
-1 -1 


0. 
spect to the basis {(0, 1, — 1), (1, —1, 1), (—1, 1, 0)}? 


6. (a) The construction of a matrix associated with a linear transformation 
depends on two bases, not one. Indeed, if X = {z1, -* +, Tn} and X = (4), +++, Zn} 
are bases of U, and if A is a linear transformation on V, then the matrix [A; X, X] 
of A with respect to X and X should be defined by 

Az; = D: Qijõi. 
The definition adopted in the text corresponds to the special case in which X = X. 
The special case leads to the definition of similarity (B and C are similar if there 
exist bases X and Y such that [B; X] = [C; ‘Y]). The analogous relation suggested 
by the general case is called equivalence; B and C are equivalent if there exist basis 
pairs (X, X) and (Y, Y) such that [B; X, X] = [C; Y, Y]. Prove that this notion 
is indeed an equivalence relation. 

(b) Two linear transformations B and C are equivalent if and only if there exist 
invertible linear transformations P and Q such that PB = CQ. 

(c) If A and B are equivalent, then so also are A’ and B’. 

(d) Does there exist a linear transformation A such that A is equivalent to a 
scalar a, but A = a? 

(e) Do there exist linear transformations A and B such that A and B are equiva- 
lent, but A? and B? are not? 

(f) Generalize the concept of equivalence to two transformations defined on 
different vector spaces. Which of the preceding results remain valid for the gener- 
alized concept? 


§ 48. Quotient transformations 


Suppose that A is a linear transformation on a vector space U and that 
W is a subspace of Y invariant under A. Under these circumstances there 
is a natural way of defining a linear transformation (to be denoted by 
A/st) on the space V/M; this “quotient transformation” is related to A 
just about the same way as the quotient space is related to U. It will 
be convenient (in this section) to denote U/st by the more compact 
symbol U7, and to use related symbols for the vectors and the linear 
transformations that occur. Thus, for instance, if x is any vector in U, we 
shall denote the coset z + M by x7; objects such as z~ are the typical 
elements of U~. 
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To define the quotient transformation A/M (to be denoted, alternatively, 
by AT), write 
ATT = (Ax) 


for every vector z in U. In other words, to find the transform by A/M 
of the coset x + M, first find the transform by A of the vector z, and then 
form the coset of M determined by that transformed vector. This defini- 
tion must be supported by an unambiguity argument; we must be sure 
that if two vectors determine the same coset, then the same is true of their 
transforms by A. The key fact here is the invariance of m. Indeed, if 
2+ =y+om, then z — y is in M, so that (invariance) Ax — Ay 
isin M, and therefore Ax + M = Ay + M. 

What happens if M is not merely invariant under A, but, together with 
a suitable subspace 9%, reduces A? If this happens, then A is the direct 
sum, say A = B @ C, of two linear transformations defined on the sub- 
spaces M and N of V, respectively; the question is, what is the relation 
between A~ and C? Both these transformations can be considered as 
complementary to A; the transformation B describes what A does on M, 
and both A` and C describe in different ways what A does elsewhere. 

Let T be the correspondence that assigns to each vector x in N the coset 

~ (=x 4+ m). We know already that T is an isomorphism between 

N and V/M (cf. § 22, Theorem 1); we shall show now that the isomorphism 
carries the transformation C over to the transformation A~. If Cr = y 
(where, of course, z is in N), then ArT = (Az)~ = (Cx)~ = y7; it 
follows that TCx = Ty = A~Tx. This implies that TC = ATT, as 
promised. Loosely speaking (see § 47) we may say that AT transforms 
VT the same way as C transforms N. In other words, the linear transforma- 
tions A~ and C are abstractly identical (isomorphic). This fact is of great 
significance in the applications of the concept of quotient space. 


§ 49, Range and null-space 


Dertrrion. If A is a linear transformation on a vector space U and if 
I is a subspace of V, the image of M under A, in symbols AM, is the 
set of all vectors of the form Az with z in M. The range of A is the set 
@(A) = AV; the null-space of A is the set N(A) of all vectors x for 
which Az = 0. 


It is immediately verified that AM and N(A) are subspaces. If, as 
usual, we denote by © the subspace containing the vector 0 only, it is easy 
to describe some familiar concepts in terms of the terminology just in- 
troduced; we list some of the results. 
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(i) The transformation A is invertible if and only if ®(A) =U and 
(A) = 0. 

(ii) In case U is finite-dimensional, A is invertible if and only if @(A) 
= Vor N(A) = 0. 

(iii) The subspace M is invariant under A if and only if AM C M. 

(iv) A pair of complementary subspaces M and N reduce A if and only 
if AM CM and ANCR. 

(v) If E is the projection on M along X, then R(E) = M and N(E) = K. 

All these statements are easy to prove; we indicate the proof of (v). 
From § 41, Theorem 2, we know that Xt is the set of all solutions of the 
equation Ex = 0; this coincides with our definition of N(E). We know 
also that 91 is the set of all solutions of the equation Ex = x. If z is in 
M, then z is also in R(E), since x is the image under EF of something (namely 
of x itself). Conversely, if a vector x is the image under E of something, 
say, x = Ey (so that x is in R(E)), then Ex = E’x = Ey = z, so that 
xz is in M. 

Warning: it is accidental that for projections R @ N = V. In general 
it need not even be true that R = @(A) and N = N(A) are disjoint. It 
can happen, for example, that for a certain vector z we have x = 0, 
Az #0, and Az = 0; for such a vector, Ax clearly belongs to both the 
range and the null-space of A. 


TuEeorem, If A is a linear transformation on a vector space U, then 


(1) (@(A))° = (A); 
if V is finite-dimensional, then 
(2) (9(A))°? = RA’). 


PROOF. If yisin (R(A))°, then, forall z in V, 
0 = [Az, y] = [x, A’y], 


so that A’y = 0 and y is in 9t(A’). If, on the other hand, y is in 2(A’), 
then, for all x in 0, 
0 = [z, A’yl = [Az, yl, 
so that y is in (R(A))°. 
If we apply (1) to A’ in place of A, we obtain 


(3) (RA)? = n(A”). 


Tf V is finite-dimensional (and hence reflexive), we may replace A” by A 
in (3), and then we may form the annihilator of both sides; the desired 
conclusion (2) follows from § 17, Theorem 2. 
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EXERCISES 


1. Use the differentiation operator on @, to show that the range and the null- 
space of a linear transformation need not be disjoint. 


2. (a) Give an example of a linear transformation on a three-dimensional space 
with a two-dimensional range. 

(b) Give an example of a linear transformation on a three-dimensional space 
with a two-dimensional] null-space. 

3. Find a four-by-four matrix whose range is spanned by (1, 0, 1, 0) and (0, 1, 0, 1). 


4. (a) Two projections E and F have the same range if and only if EF = F and 


(b) Two projections Æ and F have the same null-space if and only if EF = E 
and FE = F. 


5. If Ei, «++, Ey are projections with the same range and if a, ---, œp are scalars 
such that J; a; = 1, then }); aH; is a projection. 
§50. Rank and nullity 


We shall now restrict attention to the finite-dimensional case and draw 
certain easy conclusions from the theorem of the preceding section. 


Derinition. The rank, p(A), of a linear transformation A on a finite- 
dimensional vector space is the dimension of ®(A); the nullity, v(A), 
is the dimension of N(A). 


TueoreM 1. If A is a linear transformation on an n-dimensional vector 
space, then p(A) = p(A’) and (A) = n — p(A). 


proor. The theorem of the preceding section and § 17, Theorem 1, to- 
gether imply that 


(1) »(A’) = n — p(A). 


Let X = {x1, «++, tn} be any basis for which zı, ---, £, are in N(A); 
then, for any x = J; £73, we have 


Az = Doi țiAx; = Dot. 41 iA Ti. 


In other words, Az is a linear combination of the n — v vectors Ax,41, 
++, AZn; it follows that p(A) S n — (A). Applying this result to A’ 
and using (1), we obtain 


(2) p(A’) Sn — oA’) = p(A). 
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In (2) we may replace A by A’, obtaining 


(3) p(A) = e(A”) S 0(A’); 
(2) and (8) together show that 

(4) e(A) = ofA’), 

and (1) and (4) together show that 

(5) v(A') =n — p(A’). 
Replacing A by A’ in (5) gives, finally, 

(6) v(A) = n — (A), 


and concludes the proof of the theorem. 

These results are usually discussed from a little different point of view. 
Let A be a linear transformation on an n-dimensional vector space, and 
let X = {z1, --+, ta} be a basis in that space; let [A] = (a,;) be the matrix 
of A in the coordinate system X, so that 


Ax; = Da Qijti. 


Since if z = J; &a;, then Ax = $; &;Az;, it follows that every vector 
in ®(A) is a linear combination of the Az;, and hence of any maximal 
linearly independent subset of the Az;. It follows that the maximal num- 
ber of linearly independent Az; is precisely p(A). In terms of the co- 
ordinates (a1;, +*+, @nj) of Ax; we may express this by saying that p(A) 
is the maximal number of linearly independent columns of the matrix 
[A]. Since (§ 45) the columns of [A’] (the matrix being expressed in terms 
of the dual basis of X) are the rows of [A], it follows from Theorem 1 that 
p(A) is also the maximal number of linearly independent rows of [A]. 
Hence “the row rank of [A] = the column rank of [A] = the rank of [A].” 


THEOREM 2. If A isa linear transformation on the n-dimensional vector 
space U, and if X is any h-dimensional subspace of V, then the dimension 
of AXis Z h — v(A). 


PROOF. Let X be any subspace for which U = KX @ X, so that if k is 
the dimension of X, then k = n — h. Upon operating with A we obtain 


AV = AK + AX. 


(The sum is not necessarily a direct sum; see § 11.) Since AU = @(A) 
has dimension n — r(A), since the dimension of A X is clearly Sk =n—h, 
and since the dimension of the sum is < the sum of the dimensions, we have 
the desired result. 
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Turoreo 3. If A and B are linear transformations on a finite-dimensional 
vector space, then 


(7) p(A + B) < p(A) + p(B), 
(8) p(AB) < min {p(A), p(B)}, 
and 

(9) v(AB) < v(A) + »(B). 

If B is invertible, then 

(10) p(AB) = p(BA) = (A). 


PROOF. Since (AB)z = A(Bz), it follows that @(AB) is contained in 
@(A), so that p(AB) £ p(A), or, in other words, the rank of a product is 
not greater than the rank of the first factor. Let us apply this auxiliary 
result to B’A’; this, together with what we already know, yields (8). If 
B is invertible, then 


p(A) = p(AB-B™) S p(AB) 
and 
p(A) = p(B™?-BA) £ p(BA); 


together with (8) this yields (10). The equation (7) is an immediate conse- 
quence of an argument we have already used in the proof of Theorem 2. 
The proof of (9) we leave as an exercise for the reader. (Hint: apply 
Theorem 2 with 3€ = BU = R(B).) Together the two formulas (8) and 
(9) are known as Sylvester’s law of nullity. 


§51. Transformations of rank one 


We conclude our discussion of rank by a description of the matrices of 
linear transformations of rank < 1. 


Tarorem 1. If a linear transformation A on a finite-dimensional vector 
space V is such that p(A) S 1 (that is, p(A) = 0 or p(A) = 1), then the 
elements of the matriz [A] = (a:;) of A have the form aij = Biy; in every 
coordinate system; conversely if the matrix of A has this form in some one 
coordinate system, then p(A) S 1. 


PROOF. If p(A) = 0, then A =0, and the statement is trivial. If 
p(A) = 1, that is, (A) is one-dimensional, then there exists in @(A) a 
non-zero vector To (a basis in ®(A)) such that every vector in (A) is a 
multiple of xo. Hence, for every 2, 


Ax = Yoo, 
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where the scalar coefficient yo(= yo(x)) depends, of course, on z. The 
linearity of A implies that yo is a linear functional on U. Let X = {z;, 
-++, Zn} be a basis in V, and let (a;;) be the corresponding matrix of A, 
so that 

Aa; = Èi ijti- 
If x’ = {y1, ***, yn} is the dual basis in V’, then (ef. § 45, (2)) 


Qij = [Az;, yil. 
In the present case 


aij = [yo(z;)£o, Y] = yo(zs)[t0, Y] = [zo, yillz;, Yol; 


in other words, we may take £; = [zo, y:] and y; = [z;, yol. 

Conversely, suppose that in a fixed coordinate system X = {21, -+-, tn} 
the matrix (a;;) of A is such that a,; = ryj We may find a linear func- 
tional yo such that y; = [2;, yo], and we may define a vector 2 by zo 
= J r fete. The linear transformation A defined by Az = yo(x)z is 
clearly of rank one (unless, of course, aj; = 0 for all i and j), and its matrix 
(&;) in the coordinate system X is given by 


Qij = [Az;, ys) 
(where X’ = {y1, --+, ya} is the dual basis of X). Hence 


Qij = (yo(x;)20, ysl = (zo, ylle; Yol F; BiN, 


and, since A and have the same matrix in one coordinate system, it 
follows that 4 = A. This concludes the proof of the theorem. 

The following theorem sometimes makes it possible to apply Theorem 
1 to obtain results about an arbitrary linear transformation. 


THEOREM 2. If A is a linear transformation of rank p on a finite-di- 
mensional vector space U, then A may be written as the sum of p transforma- 
tions of rank one. 


PROOF. Since AU = ®(A) has dimension p, we may find p vectors 
Tı, +++, Z that form a basis for @(A). It follows that, for every vector 
x in U, we have 

Az = Sites bits, 


where each £; depends, of course, on z; we write £; = y;(z). It is easy to 
see that y; is a linear functional. In terms of these yi we define, for each 
t= l, +--+, p, a linear transformation A; by A,x = y,(x)z;. It follows 
that each A; has rank one and A = } fa; Ay. (Compare this result with 
§ 32, example (2).) 

A slight refinement of the proof just given yields the following result. 
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TurorEeM 3. Corresponding to any linear transformation A on a finite- 
dimensional vector space V there is an invertible linear transformation P 
for which PA is a projection. 


PROOF. Let Q and %, respectively, be the range and the null-space of 
A, and let {z1, «++, Tp} be a basis for Q. Let 2,41, «++, 2n be vectors such 
that {z1, -+-, £n} isa basis for U. Since z;is in Q fort = 1, ---, p, we may 
find vectors y; such that Ay; = 2;; finally, we choose a basis for X, which 
we may denote by {yp41, ***, yn}. We assert that {y1, ---, yn} is a basis 
for U. We need, of course, to prove only that the y’s are linearly in- 
dependent. For this purpose we suppose that $ 2i ays = 0; then we 
have (remembering that for i = p + 1, ---, n the vector y; belongs to 91) 


A (È i= ayi) = i-a az; = 0, 


whence ay =--:= a, =0. Consequently > 2. +1 aiy: = 0; the linear 
indepéndence of y)41, ** +, yn Shows that the remaining a’s must also vanish. . 

A linear transformation P, of the kind whose existence we asserted, is 
now determined by the conditions Pz; = y; 1 = l, ++, n. Indeed, if 
i=1,---,p, then PAy; = Pr; = y; and if i = p + 1, ---, n, then PAy; 
= P0 = 0. 

Consideration of the adjoint of A, together with the reflexivity of U, 
shows that we may also find an invertible Q for which AQ is a projection. 
In case A itself is invertible, we must have P = Q = At. 


EXERCISES 
1. What is the rank of the differentiation operator on @,? What is its nullity? 


2. Find the ranks of the following matrices. 


111 001 
(a) (: 1 1) (e) ( 1 o): 
1l 1 1 100 
111 0 1 0 
(b) (: 1 o); (d) ( 0 i) 
10 0 01 0 


3. If A is left multiplication by P on a space of linear transformations (ef. 
§ 38, Ex. 5), and if P has rank m, what is the rank of A? 


4. The rank of the direct sum of two linear transformations (on finite-dimensional 
vector spaces) is the sum of their ranks. 


5. (a) If A and B are linear transformations on an n-dimensional vector space, 
and if AB = 0, then p(A) + p(B) £S n. 
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(b) For each linear transformation A on an n-dimensional vector space there 
exists a linear transformation B such that AB = 0 and such that p(A) + p(B) = n. 


6. If A, B, and C are linear transformations on a finite-dimensional vector space, 
then 
p(AB) + p(BC) < p(B) + p(ABC). 


7. Prove that two linear transformations (on the same finite-dimensional vector 
space) are equivalent if and only if they have the same rank. 


8. (a) Suppose that A and B are linear transformations (on the same finite- 
dimensional vector space) such that A? = A and B? = B. Is it true that A and 
B are similar if and only if p(A) = p(B)? 

(b) Suppose that A and B are linear transformations (on the same finite-di- 
mensional vector space) such that A ~ 0, B #0, and A? = B? = 0, Is it true 
that A and B are similar if and only if p(A) = p(B)? 


9. (a) If A is a linear transformation of rank one, then there exists a unique 
scalar a such that A? = aA. 
(b) If a = 1, then 1 — A is invertible. 


§ 52. Tensor products of transformations 


Let us now tie up linear transformations with the theory of tensor 
products. Let% and U be finite-dimensional vector spaces (over the same 
field), and Jet A and B be any two linear transformations on U and U 
respectively. We define a linear transformation C on the space W of all 
bilinear forms on U @ V by writing 


(Cw)(a, y) = w(Az, By). 


The tensor product C = A Q B of the transformations A and B is, by 
definition, the dual of the transformation C, so that 


(Cz)(w) = 2(Cw) 


whenever z is in U @ V and w is in W. If we apply C to an element zo 
of the form zo = zo ® yo (recall that this means that zo(w) = w(%o, yo) 
for all w in W), we obtain 


(Cz2o)(w) = 20(Cw) = (zo ® yo) (Cw) 
= (Cw) (xo, yo) = w(Axo, Byo) = (Axo ® Byo)(w). 
We infer that 
(1) Czo = Azo @ Byo. 


Since there are quite a few elements in U ® WV of the form z ® y, enough 
at any rate to form a basis (see § 25), this relation characterizes C. 
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The formal rules for operating with tensor products go as follows. 


(2) A®0=0@B=0, 
(3) 1@1=1, 

(4) (A; + 42) Q B = (A; @ B) + (42 @ B), 
(5) A ® (Bı + Bz) = (A @ B;) + (A @ By), 
(6) aA @ BB = aß(A Q B), 

(7) (A Q B) = A Q B, 

(8) (4142) © (B,Be) = (Ar ® Bi)(Az Q Bə). 


The proofs of all these relations, except perhaps the last two, are stcaight- 
forward. 

Formula (7), as all formulas involving inverses, has to be read with 
caution. It is intended to mean that if both A and B are invertible, then 
so is A @ B, and the equation holds, and, conversely, that if A @ B is 
invertible, then so also are A and B. We shall prove (7) and (8) in reverse 
order. 

Formula (8) follows from the characterization (1) of tensor products and 
the following computation: 


(4142 © Bi Bo) (x Q y) = Adax Q B Bay 

= (A; Q By)(Aet Q Boy) = (A1 @ By)(Az Q Ba)(x @ y). 
As an immediate consequence of (8) we obtain 
(9) A@®B=(AQ1)(11@ B) = (1 8 B)(A® 1). 


To prove (7), suppose that A and B are invertible, and form A @ B 
and A~! @ B™. Since, by (8), the product of these two transformations, 
in either order, is 1, it follows that A @ B is invertible and that (7) holds. 
Conversely, suppose that A ® B is invertible. Remembering that we 
defined tensor products for finite-dimensional spaces only, we may invoke 
§ 36, Theorem 2; it is sufficient to prove that Ax = 0 implies that x = 0 
and By = 0 implies that y = 0. We use (1): 


Az Q By = (A @ B)(z @ y). 


If either factor on the left is zero, then (A @ B)(z @ y) = 0, whence 
z Q y = 0, so that either z = O or y = 0. Since (by (2)) B = Ois impos- 
sible, we may find a vector y so that By = 0. Applying the above argu- 
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ment to this y, with any z for which Az = 0, we conclude that x = 0. The 
same argument with the roles of A and B interchanged proves that B is 
invertible. 

An interesting (and complicated) side of the theory of tensor products 
of transformations is the theory of Kronecker products of matrices. Let 
X = {z1, +++, tn} and Y = {y1, --+, Ym} be bases in u and V, and let 
[A] = [A; x] = (a,;) and [B] = [B; Y] = (bpa) be the matrices of A and B. 
What is the matrix of A Q B in the coordinate system {x; Q yp}? 

To answer the question, we must recall the discussion in § 37 concerning 
the arrangement of a basis in a linear order. Since, unfortunately, it is 
impossible to write down a matrix without being committed to an order of 
the rows and the columns, we shall be frank about it, and arrange the n 
times m vectors £; ® yp in the so-called lexicographical order, as follows: 


T1 Q Yi, T1 O yo, ++, L1 O Ym, T2 @ Yy + °°, 
T2 O Ym, °°") Tn @ Yi, t, En @ Ym 
We proceed also to carry out the following computation: 
(A @ B)(z; ® Y4) = Az; @ Byg = (Dis ats) @ (Xp Boa) 
= Dei Lin asjBpa(ts Q Yp). 


This process indicates exactly how far we can get without ordering the 
basis elements; if, for example, we agree to index the elements of a matrix 
not with a pair of integers but with a pair of pairs, say (i, p) and (j, q), 
then we know now that the element in the (7, p) row and the (j, q) column 
iS aj;8pq. If we use the lexicographical ordering, the matrix of A @ B has 
the form 


aripi >>> O11Bim +++ O1nBit ++ @1nBim 
O11Bm1 + 011Bmm tt Qinbmi t: Ainbmm 
anbi ++ On1Bim °°" OnnBi1 *** OnnBim 


GQn1Bmi*** OniBmm te OnnBmi **- OnnBinm 
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In a condensed notation whose meaning is clear we may write this matrix as 


ayi[B] +--+ oinfB] 


oni [B] tes &nn{ B] 


This matrix is known as the Kronecker product of [A] and [B], in that 
order. The rule for forming it is easy to describe in words: replace each 
element a,; of the n-by-n matrix [A] by the m-by-m matrix a,;[B]. If in 
this rule we interchange the roles of A and B (and consequently interchange 
n and m) we obtain the definition of the Kronecker product of [B] and [A]. 


EXERCISES 


1. We know that the tensor product of Ca and @,, may be identified with the 
space Pram of polynomials in two variables (see § 25, Ex. 2). Prove that if A and 
B are differentiation on @, and Ọm respectively, and if C = A @ B, then C is 
az 
ðs Ot 


2. With the lexicographic ordering of the product basis {x; ® yp} it turned out 
that the matrix of A @ B is the Kronecker product of the matrices of A and B. 
Is there an arrangement of the basis vectors such that the matrix of A @ B, 
referred to the coordinate system so arranged, is the Kronecker product of the 
matrices of B and A (in that order)? 


mixed partial differentiation, that is, if z is in Pam, then Cz = 


3. If A and B are linear transformations, then 
P(A Q B) = p(A)p(B). 


§ 53. Determinants 


It is, of course, possible to generalize the considerations of the preceding 
section to multilinear forms and multiple tensor products. Instead of 
entering into that part of multilinear algebra, we proceed in a different 
direction; we go directly after determinants. 

Suppose that A is a linear transformation on an n-dimensional vector 
space U arid let w be an alternating n-linear form on U. If we write Aw 
for the function defined by 


(Aw)(a1, ***, £n) = w(Aa, «++, Atn), 


then Aw is an alternating n-linear form on VU, and, in fact, A is a linear 
transformation on the space of such forms. Since (see § 31) that space is 
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one-dimensional, it follows that A is equal to multiplication by an ap- 
propriate scalar. In other words, there exists a scalar ô such that Aw = dw 
for every alternating n-linear form w. By this somewhat roundabout 
procedure (from A to A to ô) we have associated a uniquely determined 
scalar 6 with every linear transformation A on U; we call ô the determinant 
of A, and we write ô = det A. Observe that det is neither a scalar nor a 
transformation, but a function that associates a scalar with each linear 
transformation. 

Our immediate purpose is to study the function det. We begin by finding 
the determinants of the simplest linear transformations, that is, the 
multiplications by scalars. If Ax = az for every x in U, then 


(Aw)(a1, ae) In) = wary, neeg aLn) = a"w(t1, ory Ln) 


for every alternating n-linear form w; it follows that det A = a”. We note, 
in particular, that det 0 = 0 and det 1 = 1. 

Next we ask about the multiplicative properties of det. Suppose that 
A and B are linear transformations on U, and write C = AB. If w is 
an alternating n-linear form, then 


(Cw) (zi, -++, tn) = w(A Bay, +++, ABta) 
= (Aw)(B2y, +++, Btn) = (BAw) (x1, +++, tn), 
so that C = BA. Since 


Cw = (det C)w 
and 
BAw = (det B)Aw = (det B)(det A)u, 
it follows that 


det (AB) = (det A) (det B). 


(The values of det are scalars, and therefore commute with each other.) 

A linear transformation A is called singular if det A = 0 and non-singular 
otherwise. Our next result is that A is invertible if and only if it is non- 
singular. Indeed, if A is invertible, then 


1 = det 1 = det (AA~") = (det A)(det A), 


and therefore det A = 0. Suppose, on the other hand, that det A = 0. 
If {z1, ---, tn} is a basis in Y, and if w is a non-zero alternating n-linear 
form on VU, then (det A)w(z;, -- *, In) Æ 0 by §30, Theorem 3. This 
implies, by § 30, Theorem 2, that the set {Azi, ++, Aza} is linearly in- 
dependent (and therefore a basis); from this, in turn, we infer that A is 
invertible. 
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In the classical literature determinant is defined as a function of matrices 
(not linear transformations); we are now in a position to make contact 
with that approach. We shall derive an expression for det A in terms of 
the elements a;; of the matrix corresponding to A in some coordinate 
system {21, ---, Zn}. Let w be a non-zero alternating n-linear form; we 
know that 


(1) (det AJw(zi, +++, tn) = w(Az, +*+, An). 


If we replace each Az; in the right side of (1) by >>; a,x; and expand the 
result by multilinearity, we obtain a long linear combination of terms such 
as w(Z1, ***, Zn), where each z is one of the z’s. (Compare this part of the 
argument with the proof of §30, Theorem 3.) If, in such a term, two of 
the 2’s coincide, then, since w is alternating, that term must vanish. If, 
on the other hand, all the z’s are distinct, then w(z1, -+-, Zn) = mw(%1, +++, En) 
for some permutation r, and, moreover, every permutation m can occur in 
this way. The coefficient of the term rw(z,, ---, Zn) is the product 
Qx(1),1°"‘@x(n),n- Since (§ 30, Theorem 1) w is skew symmetric, it follows 
that 


(2) det A = D0, (sgn t)arqy,1°* x (n).n 


where the summation is extended over all permutations r in Sp. (Recall 
that w(21, ***, tn) Æ 0, by §30, Theorem 3, so that division by w(x, 
+++, £n) is legitimate.) 

From this classical equation (2) we could derive many special properties 
of determinants by straightforward computation. Here is one example. 
If o and r are permutations (in $n), then (since ro is also a permutation), 
it follows that the products a(1),1°*-Gz(n).n aNd Oy6(1),0(1) °° Oxa(n).o(n) 
differ in the order of their factors only. If, for each r, we take e = r—}, 
and then alter each summand in (2) accordingly, we obtain 


det A = = (sgn ©) C4 (1) °° * On aln). 
(Note that sgn r = sgn x! and that the sum over all r is the same as 
the sum over all x~!.) Since this last sum is just like the sum in (2), except 
that a;,.) appears in place of agi), it follows from an application of 
(2) to A’ in place of A that 


det A = det A’. 


Here is another useful fact about determinants. If m is a subspace 
invariant under A, if B is the transformation A considered on M only, 
and if C is the quotient transformation A/m, then 


det A = det B-det C. 
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This multiplicative relation holds if, in particular, A is the direct sum of 
two transformations B and C. The proof can be based directly on the 
definition of determinants, or, alternatively, on the expansion obtained in 
the preceding paragraph. 

If, for a fixed linear transformation A, we write p(s) = det (A — a), 
then p is a function of the scalar à; we assert that it is, in fact, a poly- 
nomial of degree n in A, and that the coefficient of à” is (—1)”. For the 
proof we may use the notation of (1). It is easy to see that w((A — A)z1, 
-++, (A — A)zq) is a sum of terms such as Nuw(yi, +++, Yn), Where yi = 2; 
for exactly k values of i and y; = Az, for the remaining n — k values of 
i(k = 0,1,---,n). The polynomial pis called the characteristic polynomial 
of A; the equation p = 0, that is, det (A — A) = 0, is the characteristic 
equation of A. The roots of the characteristic equation of A (that is, 
the scalars a such that det (A — a) = 0) are called the characteristic roots 
of A, 


EXERCISES 


1. Use determinants to get a new proof of the fact that if A and B are linear 
transformations on a finite-dimensional vector space, and if AB = 1, then both A 
and B are invertible. 


2. If A and B are linear transformations such that AB = 0, A #0, B #0, 
then det A = det B = 0. 


3. Suppose that (a,j) is a non-singular n-by-n matrix, and suppose that V: CARERS 
An are linear transformations (on the same vector space). Prove that if the linear 
transformations >. ajAj;, i = 1, «++, n, commute with each other, then the same 
is true of Ay, ---, An. 


4. If {z «++, tn} and {Yi «++, Yn} are bases in the same vector space, and if A 
is a linear transformation such that Av; = y; è = 1, ---, n, then det A = 0. 


5. Suppose that {2, ---, Za} is a basis in a finite-dimensional vector space 0. 
If y1, «++, Ya are vectors in U, write w(y1, ---, Yn) for the determinant of the linear 
transformation A such that Az; = y;,j =1,---,n. Prove that w is an alternating 
n-linear form. 


6. If, in accordance with § 53, (2), the determinant of a matrix (aj) (not a 
linear transformation) is defined to be Sve (sgn m)ax(),1°**Gx(n),n, then, for each 
linear transformation A, the determinants of all the matrices [A; X] are all equal 
to each other. (Here X is an arbitrary basis.) 


7. If (æy) is an n-by-n matrix such that a = 0 for more than n? ~ n pairs of 
values of z and j, then det (a,j) = 0. 


8. If A and B are linear transformations on vector spaces of dimensions n and 
m, respectively, then 
det (A © B) = (det A)™-(det B)”. 
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9. If A, B, C, and D are matrices such that C and D commute and D is invertible, 
then (ef. § 38, Ex. 19) 
A B 
det (G p) = det (AD — BC). 


(Hint: multiply on the right by ( E °) .) What if D is not invertible? What 
if C and D do not commute? 
10. Do A and A’ always have the same characteristic polynomial? 


11. (a) If A and B are similar, then det A = det B. 

(b) If A and B are similar, then A and B have the same characteristic poly- 
nomial. 

(c) If A and B have the same characteristic polynomial, then det A = det B. 

(d) Is the converse of any of these assertions true? 


12. Determine the characteristic polynomial of the matrix (or, rather, of the 
linear transformation defined by the matrix) 


0 1 0 ose 0 
0 0 1 soe 0 
0 0 0 eee 1 
Qn—-1 On-2 On-3 *°° a 


and conclude that every polynomial is the characteristic polynomial of some linear 
transformation. 


13. Suppose that A and B are linear transformations on the same finite-di- 
mensional vector space. 

(a) Prove that if A is a projection, then AB and BA have the same charac- 
teristic polynomial. (Hint: choose a basis that makes the matrix of A as simple as 
possible and then compute directly with matrices.) 

(b) Prove that, in all cases, AB and BA have the same characteristic polynomial. 
(Hint: find an invertible P such that PA is a projection and apply (a) to PA and 
BP!) 


§ 54. Proper values 


A scalar à is a proper value and a non-zero vector z is a proper vector of a 
linear transformation A if Az = \x. Almost every combination of the ad- 
jectives proper, latent, characteristic, eigen, and secular, with the nouns 
root, number, and value, has been used in the literature for what we call a 
proper value. It is important to be aware of the order of choice in the 
definition; A is a proper value of A if there exists a non-zero vector x for 
which Az = dz, and a non-zero vector x is a proper vector of A if there 
exists a scalar à for which Ax = dz. 
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Suppose that A is a proper value of A; let M be the collection of all vectors 
z that are proper vectors of A belonging to this proper value, that is, for 
which Ax = Az. Since, by our definition, 0 is not a proper vector, M does 
not contain 0; if, however, we enlarge M by adjoining the origin to it, then 
M becomes a subspace. We define the multiplicity of the proper value à as 
the dimension of the subspace M; a simple proper value is one whose 
multiplicity is equal to 1. By an obvious extension of this terminology, we 
may express the fact that a scalar A is not a proper value of A at all by saying 
that A is a proper value of multiplicity zero. The set of proper values of A 
is sometimes called the spectrum of A. Note that the spectrum of A is the 
same as the set of all scalars \ for which A — A is not invertible. 

Tf the vector space we are working with has dimension n, then the scalar 
0 is a proper value of multiplicity n of the linear transformation 0, and, 
similarly, the scalar 1 is a proper value of multiplicity n of the linear trans- 
formation 1. Since Az = dz if and only if (A— A)z = 0, that is, if and 
only if x is in the null-space of A — A, it follows that the multiplicity of ) as 
a proper value of A is the same as the nullity of the linear transformation 
A — x. From this, in turn, we infer (see § 50, Theorem 1) that the proper 
values of A, together with their associated multiplicities, are exactly the 
same as those of A’. 

We observe that if B is any invertible transformation, then 


BAB! —y\ = B(A — 0B, 


so that (A — A)x = 0 if and only if (BAB™ — »)Bz = 0. This implies 
that all spectral concepts (for example, the spectrum and the multiplicities 
of the proper values) are invariant under the replacement of A by BAB. 
We note also that if Az = Az, then 


A?r = A(Az) = AAL) = (Az) = MAT) = Xr. 


More generally, if p is any polynomial, then p(A)z = p(A)z, so that every 
proper vector of A, belonging to the proper value A, is also a proper vector 
of p(A), belonging to the proper value p(A). Hence if A satisfies any equa- 
tion of the form p(A) = 0, then p(A) = 0 for every proper value A of A. 
Since a necessary and sufficient condition that A — A have a non-trivial 
null-space is that it be singular, that is, that det (A — A) = 0, it follows 
that A is a proper value of A if and only if it is a characteristic root of A. 
This fact is the reason for the importance of determinants in linear algebra. 
The useful geometric concept is that of a proper value. From the geometry 
of the situation, however, it is impossible to prove that any proper values 
exist. By means of determinants we reduce the problem to an algebraic 
one; it turns out that proper values are the same as roots of a certain poly- 
nomial equation. No wonder now that it is hard to prove that proper val- 
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ues always exist: polynomial equations do not always have roots, and, cor- 
respondingly, there are easy examples of linear transformations with no 
proper values. 


§55. Multiplicity 


The discussion in the preceding section indicates one of our reasons for 
wanting to study complex vector spaces. By the so-called fundamental 
theorem of algebra, a polynomial equation over the field of complex num- 
bers always has at least one root; it follows that a linear transformation on 
a complex vector space always has at least one proper value. There are 
other fields, besides the field of complex numbers, over which every poly- 
nomial equation is solvable; they are called algebraically closed fields. The 
most general result of the kind we are after at the moment is that every 
linear transformation on a finite-dimensional vector space over an algebrai- 
cally closed field has at least one proper value. Throughout the rest of this 
chapter (in the next four sections) we shall assume that our field of scalars 
is algebraically closed. The use we shall make of this assumption is the 
one just mentioned, namely, that from it we may conclude that proper 
values always exist. 

The algebraic point of view on proper values suggests another possible 
definition of multiplicity. Suppose that A is a linear transformation on a 
finite-dimensional vector space, and suppose that à is a proper value of A. 
We might wish to consider the multiplicity of A as a root of the character- 
istic equation of A. This is a useful concept, which we shall call the alge- 
braic multiplicity of A, to distinguish it from our earlier, geometric, notion 
of multiplicity. 

The two concepts of multiplicity do not coincide, as the following exam- 
ple shows. If D is differentiation on the space @, of all polynomials of 
degree <n — 1, then a necessary and sufficient condition that a vector z 


A . dx 
in E, be a proper vector of D is that a dx (1) for some complex number A. 


We borrow from the elementary theory of differential equations the fact 
that every solution of this equation is a constant multiple of eè. Since, 
unless à = 0, only the zero multiple of e™ is a polynomial (which it must be 
if it is to belong to En), we must have \ = 0 and z(t) = 1. In other words, 
this particular transformation has only one proper value (which must there- 
fore occur with algebraic multiplicity n), namely, A = 0; but, and this is 
more disturbing, the dimension of the linear manifold of solutions is exactly. 
one. Hence if n > 1, the two definitions of multiplicity give different val- 
ues. (In this argument we used the simple fact that a polynomial equation 
of degree n over an algebraically closed field has exactly n roots, if multiplic- 
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ities are suitably counted. It follows that a linear transformation on an 
n-dimensional vector space over such a field has exactly n proper values, 
counting algebraic multiplicities.) 

It is quite easy to see that the geometric multiplicity of À is never greater 
than its algebraic multiplicity. Indeed, if A is any linear transformation, 
if Ap is any of its proper values, and if M is the subspace of solutions of 
Az = dox, then it is clear that M is invariant under A. If Ao is the linear 
transformation A considered on M only, then it is clear that det (4o — A) 
is a factor of det (A — A). If the dimension of W (= the geometric 
multiplicity of Ao) is m, then det (Ao — A) = (Ao — A)”; the desired result 
follows from the definition of algebraic multiplicity. It follows also that 
if Ai, ** +, Ap are the distinct proper values of A, with respective geometric 
multiplicities mı, ---, mp, and if it happens that J ?-ı m; = n, then m; is 
equal to the algebraic multiplicity of A; for each i = 1, ---, p. 

By means of proper values and their algebraic multiplicities we can 
characterize two interesting functions of linear transformations; one of 
them is the determinant and the other is something new. (Warning: these 
characterizations are valid only under our current assumption that the 
scalar field is algebraically closed.) 

Let A be any linear transformation on an n-dimensional vector space, 
and let 1, +*+, A» be its distinct proper values. Let us denote by m; the 
algebraic multiplicity of \;,j = 1, ---, p, so that mi +---+ mp =n. For 
any polynomial equation 


ag + aA +--+ apd” = 0, 


the product of the roots is (—1)"ao/a, and the sum of the roots is 
—On_1/an. Since the leading coefficient (=a,) of the characteristic poly- 
nomial det (A — A) is (—1)” and since the constant term (=ag) is det 
(A — 0) = det A, we have 


det A = I? Aya. 
This characterization of the determinant motivates the definition 
trA = SOP. mj; 


the function so defined is called the trace of A. We shall have no occasion 
to use trace in the sequel; we leave the derivation of the basic properties 
of the trace to the interested reader. 
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EXERCISES 


1. Find all (complex) proper values and proper vectors of the following matrices. 


(a) (5 o): (a) ( ; 1) 

ł 1 1 
(b) (o Di 111 
Pa o(s 1a} 


2. Let r be a permutation of the integers {1, ---, n}; if z = (&, ---, En) isa 
vector in ©”, write Az = (fr, ->*, eqn). Find the spectrum of A. 


3. Prove that all the proper values of a projection are 0 or 1 and that all the 
proper values of an involution are +1 or —1. (This result does not depend on 
the finite-dimensionality of the vector space.) 


4. Suppose that A is a linear transformation and that p is a polynomial. We 
know that if à is a proper value of A, then p(A) is a proper value of p(A); what 
can be said about the converse? 

5. Prove that the differentiation operator D on the space 0, (n > 1) is not 
reducible (that is, it is not reduced by any non-trivial pair of complementary 
subspaces W and N). 

6. If A is a linear transformation on a finite-dimensional vector space, and if A 
is a proper value of A, then the algebraic multiplicity of A for A is equal to the 
algebraic multiplicity of A for BAB~1, (Here B is an arbitrary invertible trans- 
formation.) 


7. Do AB and BA always have the same spectrum? 


8. Suppose that A and B are linear transformations on finite-dimensional vector 
spaces. 

(a) t(A@®B) = tr A + tr B. 

(b) tr(A @ B) = (tr A)(tr B). 

(c) The spectrum of A @ B is the union of the spectra of A and B. 

(d) The spectrum of A ® B consists of all the scalars of the form aß, with a 
and £ in the spectrum of A and of B, respectively. 


§ 56. Triangular form 


It is now quite easy to prove the easiest one of the so-called canonical 
form theorems. Our assumption about the scalar field (namely, that it is 
algebraically closed) is still in force. 


Tueorem 1. If A is any linear transformation on an n-dimensional vector 
space U, then there exist n + 1 subspaces Mo, Mai, «++, Wn—1, Ma with the 
following properties: 
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(i) each I; (j = 0, 1, --+, n — 1, n) is invariant under A, 
(li) the dimension of Mm; ts j, 
(iii) (© =) Mo C My C- -C Mni C My (=). 

PROOF. Ifn = 0 orn = 1, the result is trivial; we proceed by induction, 
assuming that the statement is correct for n — 1. Consider the dual trans- 
formation A’ on V’; since it has at least one proper vector, say 2’, there 
exists a one-dimensional subspace M invariant under it, namely, the set 
of all multiples of z’. Let us denote by Mn—ı the annihilator Gn V” = V) 
of M, Was = MÌ; then Marı is an (n — 1)-dimensional subspace of V, 
and 9,_1 is invariant under A. Consequently we may consider A as a 
linear transformation on Ma—ı alone, and we may find Mo, Mai, ---, Wa_e, 
Mn—1, satisfying the conditions (i), (ii), (iii). We write Ma = U, and we 
are done. 

The chief interest of this theorem comes from its matricial interpreta- 
tion. Since Mı is one-dimensional, we may find in it a vector zı # 0. 
Since Mı C Me, it follows that xı is also in Mz, and since Mə is two-dimen- 
sional, we may find in it a vector zz such that zı and zz span Ma. We pro- 
ceed in this way by induction, choosing vectors x; so that 21, ---, 2; lie in 
M; and span Wy for j = 1, ---, n. We obtain finally a basis X = {z;, 
+++, £n} in U; let us compute the matrix of A in this coordinate system. 
Since x; is in M; and since NG is invariant under A, it follows that Az; 
must be a linear combination of x1, ---, zj. Hence in the expression 


Aaj = Dis cust 

the coefficient of x; must vanish whenever i > j; in other words, i > j 
implies a;; = 0. Hence the matrix of A has the triangular form 

O11 i2 G13 "°° Qin 

O as og ++: Gen 

[A] = 

0 O ODO > annm 

0 0 O ++ am 
It is clear from this representation that det (A — a) = 0 for i = 1, ---, 


n, so that the a;; are the proper values of A, appearing on the main diagonal 
of [A] with the proper multiplicities. We sum up as follows. 


THEOREM 2. If A is a linear transformation on an n-dimensional vector 
space U, then there exists a basis X in U such that the matrix (A; X] is tri- 
angular; or, equivalently, if [A] is any matrix, there exists a non-singular 
matriz [B] such that [B]—[A][B] ts triangular. 
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The triangular form is useful for proving many results about linear 
transformations. It follows from it, for example, that for any polynomial 
p, the proper values of p(A), including their algebraic multiplicities, are 
precisely the numbers p(A), where à runs through the proper values of A. 

A large part of the theory of linear transformations is devoted to improv- 
ing the triangularization result just obtained. The best thing a matrix can 
be is not triangular but diagonal (that is, a;; = 0 unless ¿ = J); if a linear 
transformation is such that its matrix with respect to a suitable coordinate 
system is diagonal we shall call the transformation diagonable. 


EXERCISES 


ooroor 
ore 
~~” 


1. Interpret the following matrices as linear transformations on C? and, in each 
case, find a basis of C? such that the matrix of the transformation with respect to 
that basis is triangular. 
a) ¢ i) ; 0 
0 1 (e) ( 
11 1 
(b) ( o): 0 
1 0 (f) ( 
(c) ( i): 1 
1 1 
(a) G a 
2. Two commutative linear transformations on a finite-dimensional vector space 
Y over an algebraically closed field can be simultaneously triangularized. In other 
words, if AB = BA, then there exists a basis X such that both [A; X] and [B; X] 
are triangular. [Hint: to imitate the proof in § 56, it is desirable to find a subspace 
M of V invariant under both A and B. With this in mind, consider any proper 
value À of A and examine the set of all solutions of Ax = Az for the role of W.) 
3. Formulate and prove the analogues of the results of § 56 for triangular matrices 
below the diagonal (instead of above it). 


4. Suppose that A is a linear transformation over an n-dimensional vector space. 
For every alternating n-linear form w, write Aw for the function defined by 


(Aw)(a, t, En) = WAR, T2, +++, En) 
+ wlz, Aza, =", En) H+ w(x1, 2, °**, Ax). 


Since Aw is an alternating n-linear form, and, in fact, A is a linear transformation 
on the (one-dimensional) space of such forms, it follows that Aw = 7(A)-w, where 
7(A) is a scalar. 

(a) 7(0) = 0. 

(b) 7(1) =n. 

(ce) {A + B) = 7(A) + xB). 

(d) r(a@A) = ar(A). 
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(e) If the scalar field has characteristic zero and if A is a projection, then 7(A) 
= p(A). 

(f) If (aij) is the matrix of A in some coordinate system, then 7(A) = Y; a. 

(g) 7(A’) = 7(A). 

(h) 7(AB) = 7(BA). 

(i) For which permutations r of the integers 1, ---, k is it true that r(A1---A,) 
= (Ára) Arq) for all k-tuples (A, ---, Ax) of linear transformations? 

G) If the field of scalars is algebraically closed, then 7(4) = tr A. (For this 
reason trace is usually defined to be 7; the most popular procedure is to use (f) 
as the definition.) 


5. (a) Suppose that the scalar field has characteristic zero. Prove that if Eu, 
+++, Er and E, +---+ Er are projections, then E;E; = 0 whenever i ~ j. (Hint: 
from the fact that tr(Hy -+--+ Er) = (E1) +---+ tr(H,) conclude that the 
range of Hi +- - -+ Eris the direct sum of the ranges of Hy, ---, Ex.) 

(b) If Aı, +++, Ay are linear transformations on an n-dimensional vector space, 
and if Ay +++- + Ay = 1 and p(A1) +----+ p(x) S n, then each A; is a projection 
and A,A; = 0 whenever i 4 j. (Start with k = 2 and proceed by induction; use 
a direct sum argument as in (a).) 

6. (a) If A is a linear transformation on a finite-dimensional vector space over 
a field of characteristic zero, and if tr A = 0, then there exists a basis X such that 
if [A; X] = (æy), then a;; = 0 for allt, (Hint: using the fact that A is not a scalar, 
prove first that there exists a vector z such that z and Az are linearly independent. 
This proves that a1; can be made to vanish; proceed by induction.) 

(b) Show that if the characteristic is not zero, the conclusion of (a) s false. 


(Hint: if the characteristic is 2, compute BC — CB, where B = g A) and C = 
0 0 
k 0) 9) 


As an aid to getting a representation theorem more informative than the 
triangular one, we proceed to introduce and to study a very special but 
useful class of transformations. A linear transformation A is called nil- 
potent if there exists a strictly positive integer g such that A? = 0; the least 
such integer g is the index of nilpotence. 


§ 57. Nilpotence 


THEOREM 1. If A is a nilpotent linear transformation of index q on a 
finite-dimensional vector space U, and if xo is a vector for which A?~*z 
= 0, then the vectors xp, Axo, ---, Atzo are linearly independent. If 
IC is the subspace spanned by these vectors, then there exists a subspace K 
such that U = 3 @ K and such that the pair (£, K) reduces A. 


PROOF. To prove the asserted linear independence, suppose that 
>-f=d a;A ‘zo = 0, and let j be the least index such that aj = 0. (We do 
not exclude the possibility j = 0.) Dividing through by —a; and chang- 
ing the notation in an obvious way, we obtain a relation of the form 


Aizto = Diaj a;A'ro = AD aA Flag) = Aty, 
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It follows from the definition of q that 
Al 2 = AIZ Azo = AISA Hy = Aly = 0; 


since this contradicts the choice of zo, we must have a; = 0 for each j. 

It is clear that X is invariant under A; to construct X we go by induc- 
tion on the index q of nilpotence. If q = 1, the result is trivial; we now 
assume the theorem for g — 1. The range & of A is a subspace that is in- 
variant under A; restricted to ® the linear transformation A is nilpotent 
of index q — 1. We write 3Co = 3 N Gand yo = Azo; then Kp is spanned 
by the linearly independent vectors yo, Ayo, ---, A9 ?yo. The induction 
hypothesis may be applied, and we may conclude that @ is the direct sum 
of 3Co and some other invariant subspace Ko. 

We write Kı for the set of all vectors x such that Ax is in Ko; it is clear 
that X, is a subspace. The temptation is great to set K = XK, and to at- 
tempt to prove that « has the desired properties. Unfortunately this need 
not be true; 3 and K, need not be disjoint. (It is true, but we shall not 
use the fact, that the intersection of 3€ and X, is contained in the null- 
space of A.) That, in spite of this, K, is useful is caused by the fact that 
K+ Kı =V. To prove this, observe that Az is in & for every z, and, 
consequently, Az = y + z with y in Xo and z in Ko. The general element 
of 3Co is a linear combination of Azo, ---, A*—129; hence we have 


y = DTT aA‘tg = A(Z II? ary1Aity) = Ayı, 


where y; isin 3. It follows that Ax = Ay; + z, or A(z — yı) = z, so that 
A(x — yı) isin Kp. This means that z — yı is in Kı, so that z is the sum 
of an element (namely yı) of 3¢ and an element (namely z — y1) of Kı. 

As far as disjointness is concerned, we can say at least that JC N Ko = 0. 
To prove this, suppose that z is in 3C N Xp, and observe first that Ax is in 
Ho (since z is in KH). Since Ko is also invariant under A, the vector Az be- 
longs to Ko along with z, so that Ax = 0. From this we infer that x is in 
Ho. (Since x is in 3€, we have z = ) 924 a;A'ro; and therefore 0 = Ax 
= J12} a;_1A ‘xo; from the linear independence of the A'zo it follows that 
æo =*= aga = 0, so that z = ag_1A%'zq.) We have proved that if 
x belongs to 3 N Ko, then it belongs also to 3Co N Ko, and hence that 
x= 0. 

The situation now is this: J€ and X, together span V, and Kı contains 
the two disjoint subspaces Kp and X N Kı. If we let K'o be any comple- 
ment of Ko @ (e N Kı) in Kı, that is, if 


K'o @ Ko D EN KH) = Ki, 


then we may write K = K'o @ Ko; we assert that this K has the desired 
properties. In the first place, K C K, and X is disjoint from 3€ N Kj; it 
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follows that 3 N K = 0. In the second place, 3€ @ X contains both 3¢ 
and Kı, so that L @ K =V. Finally, X is invariant under A, since the 
fact that K C Kı implies that AX C Ko C K. The proof of the theorem 
is complete. 

Later we shall need the following remark. If £ is any other vector for 
winch AZo 0, if f is the subspace spanned by the vectors fo, AZo, 

, Ato, and if, finally, X is any subspace that together with 3 re- 

lia A, then the behavior of A on 5 and X is the same as its behavior on 
de and K respectively. (In other words, in spite of the apparent non- 
uniqueness in the statement of Theorem 1, everything is in fact uniquely 
determined up to isomorphisms.) The truth of this remark follows from 
the fact that the index of nilpotence of A on & (r, say) i is the same as the 
index of nilpotence of A on K (7, say). This fact, in turn, is proved as 
follows. Since A'U = A’5C + A'K and also A'U = A'K + A'R (these 
results depend on the invariance of all the subspaces involved), it follows 
that the dimensions of the right sides of these equations may be equated, 
and hence that (qg —r) +0 = Q4 -r+ ¢ —7). 

Using Theorem 1 we can find a complete geometric characterization of 
nilpotent transformations. 


THEOREM 2. If A is a nilpotent linear transformation of index q on a 
finite-dimensional vector space U, then there exist positive integers r, qi, +*+, 
qr and vectors £1, ---, £y such that (i) qi 2+°: 2 qr, (ii) the vectors 


~il 
Ti, An, a) An Ti, 


—1 
T2, Azə, ea At T2, 


. e © e o è o o o o 


Er, Ady, +++, AM ly, 


form a basis for U, and (iii) A%z, = Ax, =---= Az, = 0. The 
integers r, qı, ***, qr form a complete set of isomorphism invariants of A. 
If, in other words, B is any other nilpotent linear transformation on a 
finite-dimensional vector space W, then a necessary and sufficient condition 
that there exist an isomorphism T between U and W such that TAT = 

ts that the integers r, qı, -> - , qr attached to B be the same as the ones attached 
to A. 


PROOF. ‘We write q, = q and we choose z, to be any vector for which 
A%—tz, #0. The subspace spanned by z1, Az, ---, A%~!z, is invariant 
under A, and, by Theorem 1, possesses an invariant complement, which, 
naturally, has strictly lower dimension than U. On this complementary 
subspace A is nilpotent of index q2, say; we apply the same reduction pro- 
cedure to this subspace (beginning with a vector x2 for which A%:~!zq 0). 
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We continue thus by induction till we exhaust the space. This proves the 
existential part of the theorem; the remaining part follows from the unique- 
ness (up to isomorphisms) of the decomposition given by Theorem 1. 

With respect to the basis {A‘z;} described in Theorem 2, the matrix of 
A takes on a particularly simple form. Every matrix element not on the 
diagonal just below the main diagonal vanishes (that is, a;; = 0 implies 
j = i — 1), and the elements below the main diagonal begin (at top) with 
a string of 1’s followed by a single 0, then go on with another string of 1’s 
followed by a 0, and continue so on to the end, with the lengths of the 
strings of 1’s monotonely decreasing (or, at any rate, non-increasing). 

Observe that our standing assumption about the algebraic closure of the 
field of scalars was not used in this section. 


EXERCISES 


1. Does there exist a nilpotent transformation of index 3 on a 2-dimensional 
space? 


2. (a) Prove that a nilpotent linear transformation on a finite-dimensional 
vector space has trace zero. 

(b) Prove that if A and B are linear transformations (on the same finite-di- 
mensional vector space) and if C = AB — BA, then 1 — C is not nilpotent. 


3. Prove that if A is a nilpotent linear transformation of index g on a finite-di- 
mensional vector space, then 


v(AF+) + (4*1) < 2n(A*) 
fork =1,---,q—1. 
4. If A is a linear transformation (on a finite-dimensional vector space over an 
algebraically closed field), then there exist linear transformations B and C such 


that A = B + C, B is diagonable, C is nilpotent, and BC = CB; the transforma- 
tions B and C are uniquely determined by these conditions. 


§ 58. Jordan form 


It is sound geometric intuition that makes most, of us conjecture that, 
for linear transformations, being invertible and being in some sense zero 
are exactly opposite notions. Our disappointment in finding that the range 
and the null-space need not be disjoint is connected with this conjecture. 
The situation can be straightened out by relaxing the sense in which we 
interpret “being zero”; for most practical purposes a linear transformation 
some power of which is zero (that is, a nilpotent transformation) is as zeroish 
as we can expect it to be. Although we cannot say that a linear transforma- 
tion is either invertible or “zero” even in the extended sense of zeroness, we 
can say how any transformation is made up of these two extreme kinds. 


Sgc. 58 JORDAN FORM 113 


TueoreM 1. Every linear transformation A on a finite-dimensional vector 
space Y is the direct sum of a nilpotent transformation and an invertible 
transformation. 


PROOF. We consider the null-space of the k-th power of A; this is a sub- 
space 9, = 91(A*). Clearly Nı C Ny C---. We assert first that if ever 
Ne = Nezi, then Ne = z+; for all positive integers j. Indeed, if A* tz 
= 0, then A*t1A~1z = 0, whence (by the fact that Ns = Re+) it follows 
that A*A’~*z = 0, and therefore that A**7—'z = 0, In other words, Hk+; 
is contained in (and therefore equal to) 9tz4.;_1; induction on j establishes 
our assertion. 

Since VU is finite-dimensional, the subspaces Ny cannot continue to in- 
crease indefinitely; let g be the smallest positive integer for which 1, = 
Ng41- It is clear that N, is invariant under A (in fact each Ny is such). 
We write ®, = &(A*) for the range of A* (so that, again, it is clear that 
Qg is invariant under A); we shall prove that U = Xt, ® Qq and that A 
on I, is nilpotent, whereas on @, it is invertible. 

If x is a vector common to N; and QR, then Afr = 0 and z = Ay for 
some y. It follows that A®%y = 0, and hence, from the definition of q, that 
x = Áy = 0. We have shown thus that the range and the null-space of 
A? are disjoint; a dimensionality argument (see § 50, Theorem 1) shows 
that they span VU, so that © is their direct sum. It follows from the defini- 
tions of q and N, that A on Xq is nilpotent of index g. If, finally, z is in 
Q (so that z = A%y for some y) and if Ar = 0, then A%*y = 0, whence 
z = Aty = 0; this shows that A is invertible on @,. The proof of Theo- 
rem 1 is complete. 

The decomposition of A into its nilpotent and invertible parts is unique. 
Suppose, indeed, that U = x @ K so that A on 5 is nilpotent and A on 
X is invertible. Since 3¢ C N(A*) for some k, it follows that 3€ C N, and, 
since K C &(A*) for all k, it follows that K c Qa; these facts together 
imply that X = N, and K = Ry. 

We can now use our results on nilpotent transformations to study the 
structure of arbitrary transformations. The method of getting a nilpotent 
transformation out of an arbitrary one may seem like a conjuring trick, but 
it is a useful trick, which is often employed. What is essential is the guar- 
anteed existence of proper values; for that reason we continue to assume 
that the scalar field is algebraically closed (see § 55). 


THEOREM 2. If A is a linear transformation on a finite-dimensional vector 
space U, and tf M, +++, Xp are the distinct proper values of A with respective 
algebraic multiplicities mı, +--+, Mp, then V is the direct sum of p subspaces 
Mı, +++, Wp of respective dimensions mı, ---, Mp, such that each Mi; is 
tnvariani under A and such that A — A; is nilpotent on IN. 
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PROOF. Take any fixed j = 1, ---, p, and consider the linear transforma- 
tion A; = A — ,;. To A; we may apply the decomposition of Theorem 1 
to obtain subspaces M; and 91; such that A; is nilpotent on I; and inverti- 
ble on Nj. Since My is invariant under A;, it is also invariant under 
A; +); = A. Hence, for every \, the determinant of A — ) is the product 
of the two corresponding determinants for the two linear transformations 
that A becomes when we consider it on M; and N; separately. Since the 
only proper value of A on NG is \;, and since A on Jų does not have the 
proper value A; (that is, A — ày is invertible on J4), it follows that the di- 
mension of M; is exactly m; and that each of the subspaces M; is disjoint 
from the span of all the others. A dimension argument proves that M, ® 
+++ © Mp = V and thereby concludes the proof of the theorem. 

We proceed to describe the principal results of this section and the pre- 
ceding one in matricial language. If A is a linear transformation on a 
finite-dimensional vector space U, then with respect to a suitable basis of 
V, the matrix of A has the following form. Every element not on or imme- 
diately below the main diagonal vanishes. On the main diagonal there 
appear the distinct proper values of A, each a number of times equal to 
its algebraic multiplicity. Below any particular proper value there appear 
only 1’s and 0’s, and these in the following way: there are chains of 1’s 
followed by a single 0, with the lengths of the chains decreasing as we read 
from top to bottom. This matrix is the Jordan form or the classical canoni- 
cal form of A; we have B = TAT! if and only if the classical canonical 
forms of A and B are the same except for the order of the proper values. 
(Thus, in particular, a linear transformation A is diagonable if and only if 
its classical canonical form is already diagonal, that is, if every chain of 
1’s has length zero.) 

Let us introduce some notation. Let A have p distinct proper values 
Ai, ttt, Ap, With algebraic multiplicities mi, -+-, mp, as before; let the 
number of chains of 1’s under à; be r;, and let the lengths of these chains 
be gja — 1, g2 — 1, +++, Gr; — 1. The polynomial ej; defined by e:(A) 
= (A — A;)%* is called an elementary divisor of A of multiplicity q;,; belong- 
ing to the proper value A;. An elementary divisor is called simple if its 
multiplicity is 1 (so that the corresponding chain length is 0); we see that a 
linear transformation is diagonable if and only if its elementary divisors 
are simple. 

To illustrate the power of Theorem 2 we make one application. We 
may express the fact that the transformation A — à; on Wọ is nilpotent of 
index q;ı by saying that the transformation A on M; is annulled by the 
polynomial e;;. It follows that A on Y is annulled by the product of these 
polynomials (that is, by the product of the elementary divisors of the 
highest multiplicities); this product is called the minimal polynomial of A. 
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It is quite easy to see (since the index of nilpotence of A — A; on MN; is 
exactly 9;,1) that this polynomial is uniquely determined (up to a multi- 
plicative factor) as the polynomial of smallest degree that annuls A. Since 
the characteristic polynomial of A is the product of all the elementary 
divisors, and therefore a multiple of the minimal polynomial, we obtain 
the Hamilton-Cayley equation: every linear transformation is annulled by 
its characteristic polynomial. 


EXERCISES 


1 0 1 
1. Find the Jordan form of (o 0 o); 
0 0 —1 


2. What is the maximum number of pairwise non-similar linear transformations 
on & three-dimensional vector space, each of which has the characteristic poly- 
nomial (à — 1)? 


3. Does every invertible linear transformation have a square root? (To say that 
A is a square root of B means, of course, that A? = 


4, (a) Prove that if w is a cube root of 1 (w = 1), then the matrices 


01 0 100 

( 0 r) and (0 w o,) 

100 0 0 æ 
are similar. 


(b) Discover and prove a generalization of (a) to higher dimensions. 


0 le 0 10 
5. (a) Prove that the matrices ( 0 i and ( 0 r) are similar. 
0 0 0 00 0 


(b) Discover and prove a generalization of (a) to higher dimensions. 
6. (a) Show that the matrices 


111 3 0 0 
(i 13) ma (0 o 0) 
111 0 0 0 
are similar (over, say, the field of complex numbers). 
(b) Discover and prove a generalization of (a) to higher dimensions. 
7. If two real matrices are similar over Œ, then they are similar over Q. 
8. Prove that every matrix is similar to its transpose. 


9. If A and B are n-by-n matrices such that the 2n-by-2n matrices C °) and 
(3 >) are similar, then A and B are similar. 
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10. Which of the following matrices are diagonable (over the- field of complex 
numbers)? 


001 001 
(a) (: 0 o) (d) ( 0 o); 
010 100 
001 10 0 
(b) ( 0 o); (e) ( 0 o); 
000 001 


001 
a(o 0 o) 
-1 0 0 


What about the field of real numbers? 
11. Show that the matrix 


ooo 
oo m 


0 
0 ` 
1 


Orno 


1000 


is diagonable over the field of complex numbers but not over the field of real num- 
bers. 


12. Let m be a permutation of the integers {1, ---, n}; if z = (Èn ---, n) isa 
vector in Œ”, write Az = (ra), ***, Ern). Prove that A is diagonable and 
find a basis with respect to which the matrix of A is diagonal. 


13. Suppose that A is a linear transformation and that SM is a subspace invariant 
under A. Prove that if A is diagonable, then so also is the restriction of A to M. 


14. Under what conditions on the complex numbers a, «--, @, is the matrix 
O -- Oa 
0 +s. a 0 
a > 0 0 


diagonable (over the field of complex numbers)? 


15. Are the following assertions true or false? 

(a) A real two-by-two matrix with a negative determinant is similar to a diagonal 
matrix. 

(b) If A is a linear transformation on a complex vector space, and if A =1 
for some positive integer k, then A is diagonable. , 

(c) If A is a nilpotent linear transformation on & finite-dimensional vector 
space, then A is diagonable. 

16. If A is a linear transformation on a finite-dimensional vector space over an 


algebraically closed field, and if every proper value of A has algebraic multiplicity 
1, then A is diagonable. 
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17. If the minimal polynomial of a linear transformation A on an n-dimensional 
vector space has degree n, then A is diagonable. 
18. Find the minimal polynomials of all projections and all involutions. 


19, What is the minimal polynomial of the matrix 


1 0 Oœ. 0 
0 Az 0 0 
0 0 d 0}? 
0 0 0 An 


20. (a) What is the minimal polynomial of the differentiation operator on @,? 
(b) What is the minimal polynomial of the transformation A on @, defined by 
(Az)() = a(t + 1)? 


21. If A is a linear transformation with minimal polynomial p, and if g is a poly- 
nomial such that ¢(A) = 0, then g is divisible by p. 


22. (a) If A and B are linear transformations, if p is a polynomial such that p(AB) 
= 0, and if g(t) = tp(), then q(BA) = 0. 

(b) What can be inferred from (a) about the relation between the minimal 
polynomials of AB and of BA? 


23. A linear transformation is invertible if and only if the constant term of its 
minimal polynomial is different from zero. 


CHAPTER III 


ORTHOGONALITY 


§ 59. Inner products 


Let us now get our feet back on the ground. We started in Chapter I 
by pointing out that we wish to generalize certain elementary properties 
of certain elementary spaces such as R. In our study so far we have done 
this, but we have entirely omitted from consideration one aspect of @?. 
We have studied the qualitative concept of linearity; what we have entirely 
ignored are the usual quantitative concepts of angle and length. In the 
present chapter we shall fill this gap; we shall superimpose on the vector 
spaces to be studied certain numerical functions, corresponding to the ordi- 
nary notions of angle and length, and we shall study the new structure 
(vector space plus given numerical function) so obtained. For the added 
depth of geometric insight we gain in this way, we must sacrifice some 
generality; throughout the rest of this book we shall have to assume that 
the underlying field of scalars is either the field & of real numbers or the 
field C of complex numbers. 

For a clue as to how to proceed, we first inspect @?. If x = (&, &) 
and y = (m, n2) are any two points in ®?, the usual formula for the dis- 
tance between x and y, or the length of the segment joining x and y, is 


V (tı — m)? + (2 — 2)”. It is convenient to introduce the notation 
lell = Ve? + i? 


for the distance from z to the origin 0 = (0, 0); in this notation the dis- 
tance between x and y becomes || x — y |}. 

So much, for the present, for lengths and distances; what about angles? 
It turns out that it is much more convenient to study, in the general case, 
not any of the usual measures of angles but rather their cosines. (Roughly 
speaking, the reason for this is that the angle, in the usual picture in the 
circle of radius one, is the length of a certain circular arc, whereas the co- 
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sine of the angle is the length of a line segment; the latter is much easier 
to relate to our preceding study of linear functions.) Suppose then that 
we let a be the angle between the segment from 0 to x and the positive & 
axis, and let £ be the angle between the segment from 0 to y and the same 
axis; the angle between the two vectors z and y is a — 8, so that its cosine is 


£m + bone i 
ieli liyi 


Consider the expression £17, + f2n2; by means of it we can express both 
angle and length by very simple formulas. We have already seen that if 
we know the distance between 0 and x for all z, then we can compute the 
distance between any x and y; we assert now that if for every pair of vec- 
tors x and y we are given the value of £17, + £72, then in terms of this value 
we may compute all distances and all angles. Indeed, if we take x = y, 
then mı + 2 becomes $? + £2? = || z ||?, and this takes care of lengths; 
the cosine formula above gives us the angle in terms of £17, ++ 2m2 and the 
two lengths |j z |] and || y |]. To have a concise notation, let us write, for 


x = (&, &) and y = (m, n2), 


cos (a — 8) = cos a cos $ + sin a sin $ = 


fim + m = (2, y); 


what we said above is summarized by the relations 


distance from 0 to z = || z || = V (z, 2), 


distance from z to y = || z — y ll, 
(z, y) 


AEI 


The important properties of (z, y), considered as a numerical function of 
the pair of vectors x and y, are the following: it is symmetric in z and y, it 
depends linearly on each of its two variables, and (unless x = 0) the value 
of (z, x) is always strictly positive. (The notational conflict between the 
use of parentheses in (x, y) and in (£1, ¢2) is only apparent. It could arise 
in two-dimensional spaces only, and even there confusion is easily avoided.) 
Observe for a moment the much more trivial picture in R!. For z = 
(¢:) and y = (nı) we should have, in this case, (z, y) = £m (and it is for 
this reason that (z, y) is known as the inner product or scalar product of 
x and y). The angle between any two vectors is either 0 or v, so that its 
cosine is either +1 or —1. This shows up the much greater sensitivity of 
the function given by (zx, y), which takes on all possible numerical values. 


cosine of angle between x and y = 
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§ 60. Complex inner products 


What happens if we want to consider €? instead of @?? The generaliza- 
tion seems to lie right at hand; for z = (1, £2) and y = (m, n2) (where now 
the #’s and 7’s may be complex numbers), we write (x, y) = £11 + £202, 
and we hope that the expressions || x || = (x, z) and || x — y || can be used 
as sensible measures of distance. Observe, however, the following strange 


phenomenon (where 1 = V —1): 
|| ix ||? = (iz, iz) = ilz, iz) = P@,x) = -|| æ IP. 


This means that if || x || is positive, that is, if x is at a positive distance 
from the origin, then iz is not; in fact the distance from 0 to iz is imaginary. 
This is very unpleasant; surely it is reasonable to demand that whatever 
it is that is going to play the role of (x, y) in this case, it should have the 
property that for z = y it never becomes negative. A formal remedy lies 
close at hand; we could try to write 


(x, y) = $ñ + Ete 


(where the bar denotes complex conjugation). In this definition the ex- 
pression (x, y) loses much of its former beauty; it is no longer quite sym- 
metric in x and y and it is no longer quite linear in each of its variables. 
But, and this is what prompted us to give our new definition, 


(z, £) = 1 + tebe = (él? + fe)? 


is surely never negative. It is a priori dubious whether a useful and elegant 
theory can be built up on the basis of a function that fails to possess so 
many of the properties that recommended it to our attention in the first 
place; the apparent inelegance will be justified in what follows by its suc- 
cess. A cheerful portent is this. Consider the space C! (that is, the set of 
all complex numbers). It is impossible to draw a picture of any configura- 
tion in this space and then to be able to tell it apart from a configuration in 
a’, but conceptually it is clearly a different space. The analogue of (z, y) 
in this space, for z = (£1) and y = (m), is given by (z, y) = 1ñ, and this 
expression does have a simple geometric interpretation. If we join x and 
y to the origin by straight line segments, (x, y) will not, to be sure, be the 
cosine of the angle between the two segments; it turns out that, for || z || 
= || y || = 1, its real part is exactly that cosine. 

The complex conjugates that we were forced to introduce here will come 
back to plague us later; for the present we leave this heuristic introduction 
and turn to the formal work, after just one more comment on the notation. 
The similarity of the symbols (,) and [,], the one used here for inner product 
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and the other used earlier for linear functionals, is not accidental. We 
shall show later that it is, in fact, only the presence of the complex conju- 
gation in (,) that makes it necessary to use for it a symbol different from 
[,]. For the present, however, we cannot afford the luxury of confusing 
the two. 


§ 61. Inner product spaces 


Derinirtion. An inner product in a (real or complex) vector space is a 
(respectively, real or complex) numerically valued function of the ordered 
pair of vectors z and y, such that 


(1) (z, y) = y, z) ’ 
(2) (artı + azt, yY) = 04 (21, Y) + az(z2, Y), 
(3) (z, x) 20; (a, x) = 0 if and only if z = 0. 


An inner product space is a vector space with an inner product. 


We observe that in the case of a real vector space, the conjugation in (1) 
may be ignored. In any case, however, real or complex, (1) implies that 
(x, x) is always real, so that the inequality in (3) makes sense. In an inner 
product space we shall use the notation 


V (z, 2) = |z ll; 


the number || æ || is called the norm or length of the vector z. A real inner 
product space is sometimes called a Euclidean space; its complex analogue 
is called a unitary space. 

As examples of unitary spaces we may consider ©” and @; in the first 
case we write, for z = (&, wry En) and. y= (m1, nese Mn) 


(z, y) = ee Ei 


and, in ®, we write 


1 ai 
(z, y) = Í x(t)y(t) dt. 


The modifications that convert these examples into Euclidean spaces (that 
is, real inner product spaces) are obvious. 
In a unitary space we have 


(2) (£, ayı + aye) = B(x, y1) + F(z, Y2). 


(To transform the left side of (2’) into the right side, use (1), expand by 
(2), and use (1) again.) This fact, together with the definition of an inner 
product, explains the terminology sometimes used to describe properties 
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(1), (2), (8) (and their consequence (2’). According to that terminology 
(x, y) is a Hermitian symmetric (1), conjugate bilinear ((2) and (2’)), and 
positive definite (3) form. In a Euclidean space the conjugation in (2’) may 
be ignored along with the conjugation in (1); in that case (x, y) is called a 
symmetric, bilinear, and positive definite form. We observe that in either 
case, the conditions on (z, y) imply for || z || the homogeneity property 


larl =] æl- ilz]. 


(Proof: || az ||? = (az, ax) = a&(z, 2).) 


§ 62. Orthogonality 


The most important relation among the vectors of an inner product 
space is orthogonality. By definition, the vectors x and y are called or- 
thogonal if (z, y) = 0. We observe that this relation is symmetric; since 


(x, y) = (y, x), it follows that (x, y) and (y, x) vanish together. If we 
recall the motivation for the introduction of (z, y), the terminology ex- 
plains itself; two vectors are orthogonal (or perpendicular) if the angle 
between them is 90°, that is, if the cosine of the angle between them is 0. 
Two subspaces are called orthogonal if every vector in each is orthogonal 
to every vector in the other. 

A set X of vectors is orthonormal if whenever both x and y are in & it 
follows that (x, y) = 0 or (z, y) = 1 according as x * y or z = y. (If x 
is finite, say X = {z1, ---, Za}, we have (z; z;) = 4,;.) We call an ortho- 
normal set complete if it is not contained in any larger orthonormal set. 

To make our last definition in this connection, we observe first that an 
orthonormal set is linearly independent. Indeed, if {zı, ---, £e} is any 
finite subset of an orthonormal set £X, then >>; a;x; = 0 implies that 


0 = (Dias, a) = Pialta 2) = Liv aids = aj; 


in other words, a linear combination of the x’s can vanish only if all the 
coefficients vanish. From this we conclude that in a finite-dimensional 
inner product space the number of vectors in an orthonormal set is always 
finite, and, in fact, not greater than the linear dimension of the space. 
We define, in this case, the orthogonal dimension of the space, as the largest 
number of vectors an orthonormal set can contain. 

Warning: for all we know at this stage, the concepts of orthogonality 
and orthonormal sets are vacuous. Trivial examples can be used to show 
that things are not so bad as all that; the vector 0, for instance, is always 
orthogonal to every vector, and, if the space contains a non-zero vector 7, 


then the set consisting of Z alone is an orthonormal set. We grant that 


= il 
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these examples are not very inspiring. For the present, however, we re- 
main content with them; soon we shall see that there are always “enough” 
orthogonal vectors to operate with in comfort. 

Observe also that we have no right to assume that the number of ele- 
ments in a complete orthonormal set is equal to the orthogonal dimension. 
The point is this: if we had an orthonormal set with that many elements, 
it would clearly be complete; it is conceivable, just the same, that some 
other set contains fewer elements, but is still complete because its nasty 
structure precludes the possibility of extending it. These difficulties are 
purely verbal and will evaporate the moment we start proving things; they 
occur only because from among the several possibilities for the definition 
of completeness we had to choose a definite one, and we must prove its 
equivalence with the others. 

We need some notation. If § is any set of vectors in an inner product 
space U, we denote by &* the set of all vectors in Y that are orthogonal to 
every vector in & It is clear that &+ is a subspace of U (whether or not & 
is one), and that & is contained in 64+ = (g+)+. It follows that the sub- 
space spanned by & is contained in §++. In case & is a subspace, we shall 
call &+ the orthogonal complement of &. We use the sign in order to be re- 
minded of orthogonality (or perpendicularity). In informal discussions, 
&+ might be pronounced as “E perp.” 


EXERCISES 


1, Given four complex numbers a, 8, y, and ô, try to define an inner product in 


C? by writing 
(z, y) = aks + Best + yE: + Ere 


whenever x = (£1, £2) and y = (m, 72). Under what conditions on a, 8, y, and ô 
does this equation define an inner product? 


2. Prove that if z and y are vectors in a unitary space, then 
A(z, y) = z+ y l? — Iz- y P+ el z + ty |? — elo — iy l. 


E Vo 
3. If inner product in @, 4: is defined by (z, y) = f x(ty(é) dt, and if zi) = ¢, 
0 
= 0, ---,n — 1, find a polynomial of degree n orthogonal to xo, 71, «++, 2n—1 


4, (a) Two vectors z and y in a real inner product space are orthogonal if and 
only if || z + y ||? = || z |}? + Il y |? 

(b) Show that (a) becomes false if “real” is changed to “complex.” 

(c) Two vectors z and y in a complex inner product space are orthogonal if and 
only if || az + By ||? = || ax I? + Il By l2 for all pairs of scalars a and 8. 

(d) If z and y are vectors in a real inner. product space, and if || z || = || y I, 
then x — y and z+ y are orthogonal. (Picture?) Discuss the corresponding 
statement for complex spaces. 
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(e) If x and y are vectors in an inner product space, then 


lz + yl? lo yi? = xl? + 21 y i 
Picture? 


§ 63. Completeness 


Taurorem 1. If = {21, +--+, £n} is any finite orthonormal set in an inner 
product space, if x is any vector, and if a; = (x, x), then (Bessel’s inequality) 


dus las? S || 2 NP- 
The vector z! = x — >; a,x; is orthogonal to each x; and, consequently, to 
the subspace spanned by X. 
PROOF. For the first assertion: 
0s llr |? = @, 2) = @ — Diam,  — Do; ajz) 

= (z, 2) — Ds ailes, 2) — Deg ale, a) + Dos Dey Bls 2i) 

= |x|? Ei les? — Ds lal + Des les? 

= || 2? EX: lal; 

for the second assertion: 
(x', a) = (£, 2) — Dis alti Ti) = aj — aj = 0. 


Turorem 2. If X is any finite orthonormal set in an inner product space 
0, the following six conditions on X are equivalent to each other. 

(1) The orthonormal set X is complete. 

(2) If (x, x) = Ofori =1,---,n, then z = 0. 

(3) The subspace spanned by X ts the whole space U. 

(4) If x is in V, then x = Do; (z, t)i; 

(5) If x and y are in 0, then (Parseval’s identity) 


(x, y) = Dos (z, 2) (as, Y). 
(6) If x is in U, then 
ol? = Xs |, x). 
proor. We shall establish the implications (1) =» (2) = (3) = (4) > 
(5) = (6) = (1). Thus we first assume (1) and prove (2), then assume 
(2) to prove (3), and so on till we finally prove (1) assuming (6). 


(1) = (2). If (æ, z) = 0 for all ¢ and x #0, then we may adjoin 
z/|| x || to X and thus obtain an orthonormal set larger than X. 
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(2) = (8). If there is an x that is not a linear combination of the z; 
then, by the second part of Theorem 1, x’ = z — >>; (z, x,)z; is different 
from 0 and is orthogonal to each z;. 

(3) = (4). If every x has the form z = > ajz; then 


(x, z) = Do; alt ti) = oy. 


(4) = (5). Ife = Yo; aa; and y = >); Bz, with a; = (z, x) and B; 
= (y, zj), then 


(z, y) = (È; Aiti, Di B;z;) = È: a8; (Xs, rj) = Di abi. 


(5) = (6). Setz = y. 
(6) = (1). If X were contained in a larger orthogonal set, say if zo is 
orthogonal to each x,, then 


|| zo I? = Dos | Go, 2s) |? = 0, 
so that 2 = 0. 


§ 64. Schwarz’s inequality 


THEOREM. If x and y are vectors in an inner product space, then (Schwarz’s 
inequality) 
læ pl s lela l. 


PROOF. Ify = 0, both sides vanish. If y = 0, then the set consisting 
of the vector y/|| y || is orthonormal, and, consequently, by Bessel’s in- 


equality 
læ, y/u DP sie H. 


The Schwarz inequality has important arithmetic, geometric, and ana- 
lytic consequences. 

(1) In any inner product space we define the distance ô(x, y) between 
two vectors z and y by 


êle, y) = lz -yl = V@—y, 2 y). 


In order for ô to deserve to be called a distance, it should have the follow- 
ing three properties: 

(i) èlz, y) = ôly, 2), 

(ii) 8(z, y) = 0; ô(z, y) = 0 if and only if z = y, 

(iii) 6(z, y) S êl(z, z) + dz, y). 
(In a vector space it is also pleasant to be sure that distance is invariant 
under translations: 

(iv) êlz, y) = 6(2 + z, y + 2).) 
Properties (i), (ii), and (iv) are obviously possessed by the particular ô we 
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defined; the only question is the validity of the “triangle inequality” (iii). 
To prove (iii), we observe that 


lz +y? = tyeta = lalt e a) t o a) lyh? 

= |z? + (z, y) +9) + Iy l? 
= || x ||? + 2 Re (z, y) + Ily |? 
< |z? +21,9) + Iy I? 
s izi? +2iel-lyt+iylP 
= (lzi + iy D7; 

replacing x by x — z and y by z — y, we obtain 

le~ yl slz- zl +l- yl, 


and this is equivalent to (iii). (We use Re ¢ to denote the real part of the 
complex number č; if ¢ = ¢ + in, with real ¢ and ņ, then Ref = ¢. The 
imaginary part of ¢, that is, the real number n, is denoted by Im ¢.) 

(2) In the Euclidean space &”, the expression 


(x, y) 
lelit y tl 


gives the cosine of the angle between z and y. The Schwarz inequality in 
this case merely amounts to the statement that the cosine of a real angle 
is $1. 

(3) In the unitary space C”, the Schwarz inequality becomes the so- 
called Cauchy inequality; it asserts that for any two sequences (£1, ++, En) 
and (m1, ***, nn) of complex numbers, we have 


(Ea iil? S DOME? Dota lal? 
(4) In the space @, the Schwarz inequality becomes 


1 pease 1 1 
| f 2070 uP < f 120a f vora. 


It is useful to observe that the relations mentioned in (1)-(4) above are 
not only analogous to the general Schwarz inequality, but actually conse- 
quences or special cases of it. 

(5) We mention in passing that there is room between the two notions 
(general vector spaces and inner product spaces) for an intermediate con- 
cept of some interest. This concept is that of a normed vector space, a 
vector space in which there is an acceptable definition of length, but noth- 
ing is said about angles. A norm in a (real or complex) vector space is a 
numerically valued function || x || of the vectors x such that || z || 2 0 un- 
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less z = 0, || ox || = |a|- || zl, and |z +yli sizl + lly]. Our dis- 
cussion so far shows that an inner product space is a normed vector space; 
the converse is not in general true. In other words, if all we are given is a 
norm satisfying the three conditions just given, it may not be possible to 
find an inner product for which (z, x) is identically equal to || x |j?. In 
somewhat vague but perhaps suggestive terms, we may say that the norm 
in an inner product space has an essentially “quadratic” character that 
norms in general need not possess. 


§ 65. Complete orthonormal sets 


THEOREM. If U is an n-dimensional inner product space, then there exist 
complete orthonormal sets in U, and every complete orthonormal set in U 
contains exactly n elements. The orthogonal dimension of 0 is the same as 
its linear dimension. 


PROOF. To people not fussy about hunting for an element in a possibly 
uncountable set, the existence of complete orthonormal sets is obvious. 
Indeed, we have already seen that orthonormal sets exist, so we choose 
one; if it is not complete, we may enlarge it, and if the resulting orthonor- 
mal set is still not complete, we enlarge it again, and we proceed in this 
way by induction. Since an orthonormal set may contain at most n ele- 
ments, in at most n steps we shall reach a complete orthonormal set. This 
set spans the whole space (see § 63, Theorem 2, (1) = (8)), and, since it is 
also linearly independent, it is a basis and therefore contains precisely n 
elements. This proves the first assertion of the theorem; the second asser- 
tion is now obvious from the definitions. 

There is a constructive method of avoiding this crude induction, and 
since it sheds further light on the notions involved, we reproduce it here 
as an alternative proof of the theorem. 

Let X = {21, +*+, £n} be any basis in U. We shall construct a complete 
orthonormal set Y = {y1, **-, yn} with the property that each y; is a 
linear combination of z1, ---, x} To begin the construction, we observe 
that zı # 0 (since & is linearly independent) and we write yı = 21/|| zı |[. 
Suppose now that y1, ---, Yr have been found so that they form an ortho- 
normal set and so that each y; (j = 1, ---, r) is a linear combination of 
Zi, +*+, 23. We write 


Z= ru layi H: + aryr), 
where the values of the scalars a1, «++, a, are still to be determined. Since 
(2, ys) = Gran — Dov ays, ys) = (Ery, ys) — a 
forj = 1, ---,7, it follows that if we choose a; = (z,41, y;), then (z, yj) = 0 
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for j = 1, +++, r. Since, moreover, z is a linear combination of 2,4; and 
Yi, ***, Yr, it is also a linear combination of z,4; and z1, ---,z,. Finally 
z is different from zero, since 21, - +, £r, Zr41 are linearly independent and 
the coefficient of z,4; in the expression for z is not zero. We write y,41 
= z/|| z ||; clearly {y1, «++, Yr, Yrti} is again an orthonormal set with all 
the desired properties, and the induction step is accomplished. We shall 
make use of the fact that not only is each y; a linear combination of the 2’s 
with indices between 1 and j, but, vice versa, each x; is a linear combina- 
tion of the y’s with indices between 1 and 7. The method of converting a 
linear basis into a complete orthonormal set that we just described is known 
as the Gram-Schmidt orthogonalization process. 

We shall find it convenient and natural, in inner product spaces, to 
work exclusively with such bases as are also complete orthonormal! sets. 
We shall call such a basis an orthonormal basis or an orthonormal coordinate 
system; in the future, whenever we discuss bases that are not necessarily 
orthonormal, we shall emphasize this fact by calling them linear bases. 


EXERCISES 


Doc 
1. Convert @z into an inner product space by writing (z, y) = f x(t)y(t) dt when- 
0 
ever x and y are in @2, and find a complete orthonormal set in that space. 


2. If z and y are orthogonal unit vectors (that is, {z, y} is an orthonormal set), 
what is the distance between z and y? 


3. Prove that if |(z, y)| = || z {l-Il y || (that is, if the Schwarz inequality reduces 
to an equality), then v and y are linearly dependent. 


4. (a) Prove that the Schwarz inequality remains true if, in the definition of an 
inner product, “strictly positive” is replaced by “non-negative.” 

(b) Prove that for a “non-negative” inner product of the type mentioned in 
(a), the set of all those vectors x for which (z, z) = 0 is a subspace. 

(e) Form the quotient space modulo the subspace mentioned in (b) and show 
that the given “inner product” induces on that quotient space, in a natural manner, 
an honest (strictly positive) inner product. 

(d) Do the considerations in (a), (b), and (c) extend to normed spaces (with 
possibly no inner product)? 


5. (a) Given a strictly positive number a, try to define a norm in ®? by writing 
xf] = (ltt lie 
whenever z = (£1, £). Under what conditions on œ does this equation define a 
norm? 
(b) Prove that the equation 


ilz I = max {[&i], lal} 


defines a norm in &?. 
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(c) To which ones among the norms defined in (a) and (b) does there correspond 
an inner product in ®? such that || z ||? = (x, x) for all z in A? 


6. (a)Prove that a necessary and sufficient condition on a real normed space that 
there exist an inner product satisfying the equation || x ||? = (x, x) for all z is that 


toby ll? + le — yl? = itz? 2l y iP 
for all z and y. 
(b) Discuss the corresponding assertion for complex spaces. 
(c) Prove that a necessary and sufficient condition on a norm in ®? that there 
exist an inner product satisfying the equation || x ||? = (a, x) for all x in @? is that 
the locus of the equation || z || = 1 be an ellipse. 


7. If {x1, --, Zn} i isa complete orthonormal set in an inner product space, and 


if yj = D41 Ta J = 1, ++, n, express in terms of the 2’s the vectors obtained by 
applying the Gram-Schmidt orthogonalization process to the y’s. 


§ 66. Projection theorem 


Since a subspace of an inner product space may itself be considered as 
an inner product space, the theorem of the preceding section may be ap- 
plied. The following result, called the projection theorem, is the most im- 
portant application. 


THEOREM. If w is any subspace of a Prole nen ional inner product 
space U, then U is the direct sum of M and M}, and M++ = M. 


PROOF. Let X = {zı}, 7, Lm} be an orthonormal set that is complete 
in M, and let z be any vector in U. We write x = >); aii, where a; = 
(2, x:); it follows from § 63, Theorem 1, that y = z — z is in M+, so that 
z is the sum of two vectors, z = z + y, with z in M and y in MH. That 
M and M+ are disjoint is clear; if z belonged to both, then we should have 
k x T: (z, x) =0. It follows from the theorem ‘of § 18 that UV = © 


ee observe that in the decomposition z = z + y, we have 
(z, 2) = (@ +y, 2) = || 2 |? + 0, 2) = I z i’, 


and, similarly, 
zy = jiyi. 


Hence, if z is in ++, so that (z, y) = 0, then || y ||? = 0, so that z (=z) 
is in M; in other words, m@++ is contained in M. Since we already know 
that M is contained in 9v++, the proof of the theorem is complete. 

This kind of direct sum decomposition of an inner product space (via a 
subspace and its orthogonal complement) is of considerable geometric in- 
terest. We shall study the associated projections a little later; they turn 
out to be an interesting and important subclass of the class of all projec- 
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tions. At present we remark only on the connection with the Pythagorean 
theorem; since (z, x) = || x ||? and (z, y) = || y ||”, we have 


Izl? =@2)=@&2)+@y [= lael? +] yl? 


In other words, the square of the hypotenuse is the sum of the squares of 
the sides. More generally, if Mj, ---, My are pairwise orthogonal sub- 
spaces in an inner product space U, and if z = zı +---+ £r, with z; in 
M; for j = 1, ---, k, then 


W]e fl? = aa? +--+ I ve IP. 


§ 67. Linear functionals 


We are now in a position to study linear functionals on inner product 
spaces. For a general n-dimensional vector space the dual space is also 
n-dimensional and is therefore isomorphic to the original space. There is, 
however, no obvious natural isomorphism that we can set up; we have to 
wait for the second dual space to get back where we came from. The main 
point of the theorem we shall prove now is that in inner product spaces 
there is a “natural” correspondence between U and VU’; the only cloud on 
the horizon is that in general it is not quite an isomorphism. 


THEOREM. To any linear functional y’ on a fintte-dimensional inner prod- 
uct space U there corresponds a unique vector y in U such that y' (£) = (x, y) 
for all x. 


PROOF. Ify’ = 0, we may choose y = 0; let us from now on assume that 
y'(z) is not identically zero. Let sm be the subspace consisting of all vectors 
x for which y'(x) = 0, and let x = s+ be the orthogonal complement of 
m. The subspace N contains a non-zero vector yọ; multiplying by a suit- 
able constant, we may assume that || yo || = 1. We write y = y'(yo)-yo. 
(The bar denotes complex conjugation, as usual; in case © is a real inner 
product space and not a unitary space, the bar may be omitted.) We do 
then have the desired relation 


(1) y(x) = (z, y) 


at least for x = yo and for all x in M. For an arbitrary x in U, we write 
zo = x — Ayo, where : 
_ ya), 

y’ (yo) 
then y'(zo) = 0 and z = x + Ayo is a linear combination of two vectors 
for each of which (1) is valid. From the linearity of both sides of (1) it 
follows that (1) holds for x, as was to be proved. 
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To prove uniqueness, suppose that (z, y1) = (x, y2) for all x. It follows 
that (x, yı — y2) = 0 for all x, and therefore in particular for z = yı — yo, 
so that || yı — y2 ||? = O and yı = y2. 

The correspondence y’ = y is a one-to-one correspondence between U 
and V’, with the property that to y'i + y’2 there corresponds yı + yo, and 
to ay’ there corresponds ay; for this reason we refer to it as a conjugate 
tsomorphism. In spite of the fact that this conjugate isomorphism makes 
v’ practically indistinguishable from U, it is wise to keep the two con- 
ceptually separate. One reason for this is that we should like V’ to be an 
inner product space along with U; if, however, we follow the clue given by 
the conjugate isomorphism between U and VU’, the conjugation again causes 
trouble. Let y'i and y’, be any two elements of 0’; if y'i(x) = (z; y1) and 
y’2(x) = (x, y2), the temptation is great to write 


(y'1, y’2) = (Yi, Ys). 


A moment’s reflection will show that this expression may not satisfy § 61, 
(2), and is therefore not a suitable inner product. The trouble arises in 
complex (that is, unitary) spaces only; we have, for example, 


(ay’s, y’2) = (@y1, yo) = (yr, Y2) = A(y's, y’2). 


The remedy is clear; we write 


(2) (y's, y’2) = (Yi, Y2) = (Ye, 41); 


we leave it to the reader to verify that with this definition U’ becomes an 
inner product space in all cases. We shall denote this inner product space 
by U*. 

We remark that our troubles (if they can be called that) with complex 
conjugation have so far been more notational than conceptual; it is still 
true that the only difference between the theory of Euclidean spaces and 
the theory of unitary spaces is that an occasional bar appears in the latter. 
More profound differences between the two theories will arise when we go 
to study linear transformations. 


§ 68. Parentheses versus brackets 


It becomes necessary now to straighten out the relation between general 
vector spaces and inner product spaces. The theorem of the preceding 
section shows that, as long as we are careful about complex conjugation, 
(z, y) can completely take the place of {z, y]. It might seem that it would 
have been desirable to develop the entire subject of general vector spaces 
in such a way that the concept of orthogonality in a unitary space becomes 
not merely an analogue but a special case of some previously studied general 
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relation between vectors and functionals. One way, for example, of avoid- 
ing the unpleasantness of conjugation (or, rather, of shifting it to a less 
conspicuous position) would have been to define the dual space of a com- 
plex vector space as the set of conjugate linear functionals, that is, the set 
of numerically valued functions y for which 


yar, + azta) = By(1) + Sey(z2). 


Because it seemed pointless (and contrary to common usage) to introduce 
this complication into the general theory, we chose instead the roundabout 
way that we just traveled. Since from now on we shall deal with inner 
product spaces only, we ask the reader mentally to revise all the preceding 
work by replacing, throughout, the bracket [z, y] by the parenthesis (x, y). 
Let us examine the effect of this change on the theorems and definitions of 
the first two chapters. 

The replacement of U’ by U* is merely a change of notation; the new 
symbol is supposed to remind us that something new (namely, an inner 
product) has been added to U’. Of a little more interest is the (conjugate) 
isomorphism between O and U*; by means of it the theorems of § 15, 
asserting the existence of linear functionals with various properties, may 
now be interpreted as asserting the existence of certain vectors in © itself. 
Thus, for example, the existence of a dual basis to any given basis X = 
{z1, -+-, tn} implies now the existence of a basis Y = {y:, ---, yn} (of V) 
with the property that (z£;, yj) = 6;;. 

More exciting still is the implied replacement of the annihilator M? of a 
subspace M (M? lying in V’ or U*) by the orthogonal complement m+ 
(lying, along with 91, in V). The most radical new development, however, 
concerns the adjoint of a linear transformation. Thus we may write the 
analogue of § 44, (1), and corresponding to every linear transformation A 
on V we may define a linear transformation A* by writing 


(Az, y) = (z, A*y) 


for every x. It follows from this definition that A* is again a linear trans- 
formation defined on the same vector space U, but, because of the Hermi- 
tian symmetry of (x, y), the relation between A and A* is not quite the 
same as the relation between A and A’. The most notable difference is 
that (in a unitary space) (ad)* = @A* (and not (aA)* = aA*). Associ- 
ated with this phenomenon is the fact that if the matrix of A, with respect 
to some fixed basis, is (@,;), then the matrix of A*, with respect to the dual 
basis, is not (a;;) but (@:). For determinants we do not have det A* = 
det A but det A* = det A, and, consequently, the proper values of A* are 
not the same as those of A, but rather their conjugates. Here, however, 
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the differences stop. All the other results of § 44 on the anti-isomorphic 
nature of the correspondence A > A* are valid; the identity A = A** 
is strictly true and does not need the help of an isomorphism to interpret it. 

Presently we shall discuss linear transformations on inner product 
spaces and we shall see that the principal new feature that differentiates 
their study from the discussion of Chapter II is the possibility of compar- 
ing A and A* as linear transformations on the same space, and of investi- 
gating those classes of linear transformations that bear a particularly simple 
relation to their adjoints. 


§ 69. Natural isomorphisms 


There is now only one more possible doubt that the reader might (or, at 
any rate, should) have. Many of our preceding results were consequences 
of such reflexivity relations as A** = A; do these remain valid after the 
brackets-to-parentheses revolution? More to the point is the following 
way of asking the question. Everything we say about a unitary space U 
must also be true about the unitary space U*; in particular it is also in a 
natural conjugate isomorphic relation with its dual space U**. If now to 
every vector in U we make correspond a vector in U**, by first applying 
the natural conjugate isomorphism from U to U* and then going the same 
way from U* to U**, then this mapping is a rival for the title of natural 
mapping from U to U**, a title already awarded in Chapter I to a seemingly 
different correspondence. What is the relation between the two natural 
correspondences? Our statements about the coincidence, except for trivial 
modifications, of the parenthesis and bracket theories, are really justified 
by the fact, which we shall n ow prove, that the two mappings are the same. 
(It should not be surprising, since a = a, that after two applications the 
bothersome conjugation disappears.) The proof is shorter than the intro- 
duction to it. 

Let yo be any element of U; to it there corresponds the linear functional 
yo* in U*, defined by yo*(z) = (z, yo), and to yo*, in turn, there corresponds 
the linear functional yo** in U**, defined by yo**(y*) = (y*, yo*). Both 
these correspondences are given by the mapping introduced in this chapter. 
Earlier (see § 16) the correspondent yp** in U** of yo in U was defined by 
yo**(y*) = y*(yo) for all y* in U*; we must show that yo**, as we here 
defined it, satisfies this identity. Let y* be any linear functional on U 
(that is, any element of 0*); we have 


yo**(y*) = (y*, yo*) = (Yo, y) = y*(Yo)- 


(The middle equality comes from the definition of inner product in U*.) 
This settles all our problems. 
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EXERCISES 
1. If M and N are subspaces of a finite-dimensional inner product space, then 


on +9)* = mt Nxt 
and 
(mN Wt = ont + x4. 


2. If y'(x) = (f1 + £: + $) for each x = (£1, £2, £3) in C?, find a vector y in 
C? such that y’(z) = (z, y). 


3. If y is a vector in an inner product space, if A is a linear transformation on 
that space, and if f(z) = (y, Ax) for every vector x, then f is a linear functional; 
find a vector y* such that f(x) = (x, y*) for every z. 


4. (a) If A is a linear transformation on a finite-dimensional inner product space, 
then tr (A*A) 2 0; a necessary and sufficient condition that tr (A*A) = 0 is that 
A=0. (Hint: look at matrices.) This property of traces can often be used to 
obtain otherwise elusive algebraic facts about products of transformations and their 
adjoints. 

(b) Prove by a trace argument, and also directly, that if A, +-+, Ax are linear 
transformations on a finite-dimensional inner product space and if >}. .4;*A; = 0, 
then Ay = ---= A, = 0. 

(c) If A*A = B*B — BB*, then A = 0. 

(d) If A* commutes with A and if A commutes with B, then A* commutes with 
B. (Hint: if C = A*B — BA* and D = AB — BA, then tr (C*C) = tr (D*D) 
+ tr[(A*A — AA*)(B*B — BB*)].) 


5. (a) Suppose that 3C is a unitary space, and form the set of all ordered pairs 
(z, y) with x and y in Fe (that is, the direct sum of JC with itself). Prove that the 
equation 


((z1, 91), (£2, y2)) = (a1, 22) + (ys, Y3) 


defines an inner product in the direct sum IC OK. 

(b) If U is defined by U (z, y} = (y, —z), then U*U = 1. 

(c) The graph of a linear transformation A on 3C is the set of all those elements 
(z, y) of 3 HH for which y = Az. Prove that the graph of every linear transforma- 
tion on JC is a subspace of 3C @ K. 

(d) If A is a linear transformation on JC with graph G, then the graph of A* 
is the orthogonal complement (in 3C @ IC) of the image under U (see (b)) of the 
graph of A. 


6. (a) If for every linear transformation A on a finite-dimensional inner product 


space N(A) = V tr (A*A), then N is a norm (on the space of all linear transforma- 
tions). 
(b) Is the norm N induced by an inner product? 


7. (a) Two linear transformations A and B on an inner product space are called 
congruent if there exists an invertible linear transformation P such that B = P*AP. 
(The concept is frequently defined for the “quadratic forms” associated with linear 
transformations and not for the linear transformations themselves; this is largely 
a matter of taste. Note that if a(x) = (Az, x) and A(z) = (Bz, z), then B = P*AP 
implies that (x) = a(Pzx).) Prove that congruence is an equivalence relation. 
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(b) If A and B are congruent, then so also are A* and B*. 

(c) Does there exist a linear transformation A such that A is congruent to a 
scalar a, but A # a? 

(d) Do there exist linear transformations A and B such that A and B are con- 
gruent, but A? and B? are not? 

(e) If two invertible transformations are congruent, then so are their inverses. 


§ 70. Self-adjoint transformations 


Let us now study the algebraic structure of the class of all linear trans- 
formations on an inner product space U. In many fundamental respects 
this class resembles the class of all complex numbers. In both systems, 
notions of addition, multiplication, 0, and 1 are defined and have similar 
properties, and in both systems there is an involutory anti-automorphism 
of the system onto itself (namely, A — A* and ¢ — £). We shall use 
this analogy as a heuristic principle, and we shall attempt to carry over 
to linear transformations some well-known concepts from the complex 
domain. We shall be hindered in this work by two difficulties in the theory 
of linear transformations, of which, possibly surprisingly, the second is 
much more serious; they are the impossibility of unrestricted division and 
the non-commutativity of general linear transformations. 

The three most important subsets of the complex number plane are the 
set of real numbers, the set of positive real numbers, and the set of num- 
bers of absolute value one. We shall now proceed systematically to use 
our heuristic analogy of transformations with complex numbers, and to try 
to discover the analogues among transformations of these well-known nu- 
merical concepts. 

When is a complex number real? Clearly a necessary and sufficient 
condition for the reality of ¢ is the validity of the equation ¢ =f. We 
might accordingly (remembering that the analogue of the complex conju- 
gate for linear transformations is the adjoint) define a linear transforma- 
tion A to be real if A = A*. More commonly linear transformations A 
for which A = A* are called self-adjoint; in real inner product spaces the 
usual word is symmetric, and, in complex inner product spaces, Hermitian. 
We shall see that self-adjoint transformations do indeed play the same role 
as real numbers. 

It is quite easy to characterize the matrix of a self-adjoint transforma- 
tion with respect to an orthonormal basis X = {21, +++, £a}. If the matrix 
of A is (a,;), then we know that the matrix of A* with respect to the dual 
basis of X is (a,;*), where a;;* = aj;; since an orthonormal basis is self-dual 
and since A = A*, we have 

Qij = Gji 


We leave it to the reader to verify the converse: if we define a linear trans- 
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formation A by means of a matrix (a,;;) and an arbitrary orthonormal co- 
ordinate system X = {2;, ---, Xn}, via the usual equations 
A( Do; ty) = Dinti 
ti = Dig ests, 
and if the matrix (a;;) is such that a,;; = aj;, then A is self-adjoint. 

The algebraic rules for the manipulation of self-adjoint transformations 
are easy to remember if we think of such transformations as the analogues 
of real numbers. Thus, if A and B are self-adjoint, so is A + B; if A is 
self-adjoint and different from 0, and if « is a non-zero scalar, then a neces- 
sary and sufficient condition that aA be self-adjoint is that a be real; and 
if A is invertible, then both or neither of A and A~" are self-adjoint. The 
place where something always goes wrong is in multiplication; the product 
of two self-adjoint transformations need not be self-adjoint. The positive 
facts about products are given by the following two theorems. 


THEOREM 1. If A and B are self-adjoint, then a necessary and sufficient 
condition that AB (or BA) be self-adjoint is that AB = BA (that is that 
A and B commute). 


proor. If AB = BA, then (AB)* = B*A* = BA = AB. If (AB)* = 
AB, then AB = (AB)* = B*A* = BA. 


Tueorem 2. If A is self-adjoint, then B*AB is self-adjoint for all B; if B 
is invertible and B*AB is self-adjoint, then A is self-adjoint. 


proor. If A = A*, then (B*AB)* = B*A*B** = BtAB. If B isin- 
vertible and B*AB = (B*AB)* = B*A*B, then (multiply by B*—' on the 
left and B— on the right) A = A*. 

A complex number ¢ is purely imaginary if and only if § = —{. The 
corresponding concept for linear transformations is identified by the word 
skew; if a linear transformation A on an inner product space is such that 
A* = —A, then A is called skew symmetric or skew Hermitian according as 
the space is real or complex. Here is some evidence for the thoroughgoing 
nature of our analogy between complex numbers and linear transforma- 
tions: an arbitrary linear transformation A may be expressed, in one and 
only one way, in the form A = B + C, where B is self-adjoint and C is 
skew. (The representation of A in this form is sometimes called the Car- 
tesian decomposition of A.) Indeed, if we write 


A+ A* 
(1) B= a 
A— A* 
(2) C = ——— 


2 
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* k 
then we have B*¥ = ane = B and C*= ZA = —C, and, of 
course, A = B+ C. From this proof of the existence of the Cartesian 
decomposition, its uniqueness is also clear; if we do have A = B + C, then 
A* = B — C, and, consequently, A, B, and C are again connected by (1) 
and (2). 

In the complex case there is a simple way of getting skew Hermitian 
transformations from Hermitian ones, and vice versa: just multiply by 
i(= V —1). It follows that, in the complex case, every linear transforma- 
tion A has a unique representation in the form A = B + iC, where B and 
C are Hermitian. We shail refer to B and C as the real and imaginary 
parts of A. 


EXERCISES 


1. Give an example of two self-adjoint transformations whose product is not 
self-adjoint. 


s DAR 
2. Consider the space @, with the inner product given by (z, y) = f x(t)y(t) dt. 
0 


(a) Is the multiplication operator T (defined by (Tx)(t) = tz(t)) self-adjoint? 
(b) Is the differentiation operator D self-adjoint? 


i _ j J F A 
3. (a) Prove that the equation (z, y) = Dino? (2) y (2) defines an inner prod 


uct in the space @,. 

(b) Is the multiplication operator T (defined by (T'x)(é) = tx(¢)) self-adjoint (with 
respect to the inner product defined in (a))? 

(c) Is the differentiation operator D self-adjoint? 


4. If A and B are linear transformations such that A and AB are self-adjoint 


and such that N(A) C N(B), then there exists a self-adjoint transformation C 
such that CA = B. 


5. If A and B are congruent and A is skew, does it follow that B is skew? 
6. If A is skew, does it follow that so is 42? How about A? 


7. If both A and B are self-adjoint, or else if both are skew, then AB + BA is 
self-adjoint and AB — BA isskew. What happens if one of A and B is self-adjoint 
and the other skew? 


8. If A is a skew-symmetric transformation on a Euclidean space, then (Az, x) 
= 0 for every vector x. Converse? 


9. If A is self-adjoint, or skew, and if A?x = 0, then Az = 0. 


10. (a) If A is a skew-symmetric transformation on a Euclidean space of odd 
dimension, then det A = 0. 

(b) If A is a skew-symmetric transformation on a finite-dimensional Euclidean 
space, then p(A) is even. 
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§71. Polarization 


Before continuing with the program of studying the analogies between 
complex numbers and linear transformations, we take. time out to pick up 
some important auxiliary results about inner product spaces. 


THEOREM 1. A necessary and sufficient condition that a linear transforma- 
tion A on an inner product space be 0 is that (Az, y) = 0 for all x and y. 


PROOF, The necessity of the condition is obvious; sufficiency follows 
from setting y equal to Az. 


THEOREM 2. A necessary and sufficient condition that a self-adjoint linear 
transformation A on an inner product space A be 0 is that (Az, x) = 0 for 
all x. 


PROOF. Necessity is obvious. The proof of sufficiency begins by verify- 
ing the identity 


(1) (Az, y) + (Ay, 2) = (A(z + y), @ + y)) — (Az, z) — (Ay, y). 


(Expand the first term on the right side.) Since A is self-adjoint, the left 
side of this equation is equal to 2 Re (Az, y). The assumed condition im- 
plies that the right side vanishes, and hence that Re (Az, y) = 0. At this 
point it is necessary to split the proof into two cases. If the inner product 
space is real (that is, A is symmetric), then (Az, y) is real, and therefore 
(Ax, y) = 0. If the inner product space is complex (that is, A is Hermi- 
tian), then we find a complex number @ such that |6| = 1 and 6(Az, y) = 
| (áz, y)|. (Here x and y are temporarily fixed.) The result we already 
have, applied to 0z in place of x, yields 0 = Re (A (0x), y) = Re @(Az, y) 
= Re |(Az, y)| = |(4z, y)|. In either case, therefore, (Az, y) = 0 for all 
x and y, and the desired result follows from Theorem 1. 

It is useful to ask how important is the self-adjointness of A in Theorem 
2; the answer is that in the complex case it is not important at all. 


THEOREM 3. A necessary and sufficient condition that a linear transforma- 
tion A on a unitary space be 0 is that (Az, x) = 0 for all z. 


PROOF. As before, necessity is obvious. For the proof of sufficiency we 
use the so-called polarization identity: 


(2) oB(Az, y) + aB(Ay, x) 
= (A(ox + By), (ax + By)) ~ |aP(Az, x) — [8|?(Ay, y). 


(Just as for (1), the proof consists of expanding the first term on the right.) 
If (Az, x) is identically zero, then we obtain, first choosing a = 6 = 1, 
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and then a = i (= V—1),68=1 
(Az, y) + (Ay, x) =0 
t(Az, y) — i(Ay, x) = 0. 


Dividing the second of these two equations by 7 and then forming their 
arithmetic mean, we see that (Az, y) = 0 for all z and y, so that, by The- 
orem 1, A = 0. 

This process of polarization is often used to get information about the 
“bilinear form” (Az, y) when only knowledge of the “quadratic form” 
(Az, x) is assumed. 

It is important to observe that, despite its seeming innocence, Theorem 3 
makes very essential use of the complex number system; it and many of its 
consequences fail to be true for real inner product spaces. The proof, of 
course, breaks down at our choice of a = V —1. For an example consider 
a 90° rotation of the plane; it clearly has the property that it sends every 
vector x into a vector orthogonal to z. 

We have seen that Hermitian transformations play the same role as real 
numbers; the following theorem indicates that they are tied up with the 
concept of reality in deeper ways than through the formal analogy that 
suggested their definition. 


THEOREM 4. A necessary and sufficient condition that a linear transforma- 
tion A on a unitary space be Hermitian is that (Ax, x) be real for all z. 


PROOF. If A = A*, then 
(Az, x) = (x, A*r) = (z, Ax) = (Az, 2), 


so that (Az, x) is equal to its own conjugate and is therefore real. If, con- 
versely, (Ax, x) is always real, then 


(Az, x) = (Az, x) = (z, A*x) = (A*z, 2), 


so that ([A — A*]z, x) = 0 for all z, and, by Theorem 3, A = A*. 

Theorem 4 is false for real inner product spaces. This is to be expected, 
for, in the first place, its proof depends on a theorem that is true for unitary 
spaces only, and, in the second place, in a real space the reality of (Az, z) 
is automatic, whereas the identity (Az, y) = (z, Ay) is not necessarily 
satisfied. 


§ 72. Positive transformations 
When is a complex number ¢ positive (that is, 20)? Two equally natural 


necessary and sufficient conditions are that ¢ may be written in the form 
¢ = # with some real ¢, or that ¢ may be written in the form ¢ = éo with 
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some c (in general complex). Remembering also the fact that (at least for 
unitary spaces) the Hermitian character of a transformation A can be 
described in terms of the inner products (Az, x), we may consider any one 
of the three conditions below and attempt to use it as the definition of posi- 
tiveness for transformations: 


(1) A = B? for some self-adjoint B, 
(2) A = C*C for some C, 
(3) A is self-adjoint and (Az, x) 2 0 for all z. 


Before deciding which one of these three conditions to use as definition, we 
observe that (1) = (2) = (8). Indeed: if A = B? and B = B*, then A 
= BB = B*B, and if A = C*C, then A* = C*C =A and (Az, x) = 
(C*Cz, x) = (Cz, Cx) = || Cr ||? 2 0. It is actually true that (3) implies 
(1), so that the three conditions are equivalent, but we shall not be able to 
prove this until later. We adopt as our definition the third condition. 


DEFINITION. A linear transformation A on an inner product space is 
positive, in symbols A 2 0, if it is self-adjoint and if (Az, x) 2 0 for all z. 


More generally, we shall write A = B (or B S A) whenever A — B 2 0. 
Although, of course, it is quite possible that the difference of two trans- 
formations that are not even self-adjoint turns out to be positive, we shall 
generally. write inequalities for self-adjoint transformations only. Observe 
that for a complex inner product space a part of the definition of positive- 
ness is superfluous; if (Az, z) = 0 for all x, then, in particular, (Az, x) is 
real for all z, and, by Theorem 4 of the preceding section, A must be 
positive. 

Positive transformations are usually called non-negative semidefinite. If 
A 2 0 and (Az, x) = 0 implies that z = 0, we shall say that A is strictly 
positive; the usual term is positive definite. Since the Schwarz inequality 
implies that 

[(Az,z)| S|] Az ijl z ll, 


we see that if A is a strictly positive transformation and if Ax = 0, then 
x = 0, so that, on a finite-dimensional inner product space, a strictly posi- 
tive transformation is invertible. We shall see later that the converse is 
true; if A 2 0 and A is invertible, then A is strictly positive. It is some- 
times convenient to indicate the fact that a transformation A is strictly 
positive by writing A > 0; if A — B > 0, we may also write A > B (or 
B <A). 

It is possible to give a matricial characterization of positive transforma- 
tions; we shall postpone this discussion till later. In the meantime we 
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shall have occasion to refer to positive matrices, meaning thereby Hermi- 
tian symmetric matrices (a,;) (that is, a;; = @;;) with the property that 
for every sequence (£1, +-+, n) of n scalars we have J; Do; ate; = 0. 
(In the real case the bars may be omitted; in the complex case Hermitian 
symmetry follows from the other condition.) These conditions are clearly 
equivalent to the condition that (æ;;) be the matrix, with respect to some 
orthonormal coordinate system, of a positive transformation. 

The algebraic rules for combining positive transformations are similar 
to those for self-adjoint transformations as far as sums, scalar multiples, 
and inverses are concerned; even § 70, Theorem 2, remains valid if we re- 
place “self-adjoint” by “positive” throughout. It is also true that if A 
and B are positive, then a necessary and sufficient condition that AB (or 
BA) be positive is that AB = BA (that is, that A and B commute), but 
we shall have to postpone the proof of this statement for a while. 


EXERCISES 


1. Under what conditions on a linear transformation A does the function of 
two variables, whose value at z and y is (Az, y), satisfy the conditions on an inner 
product? 


2. Which of the following matrices are positive? 

w ( i i} @ (9): 
111 i-i 
TEP) ae) 

© (i o) 


3. For which values of æ is the matrix 


a lil 

( 0 o) 

100 
positive? 


4. (a) If A is self-adjoint, then tr A is real. 
(b) If A = 0, then tr A = 0. 


5. (a) Give an example of a positive matrix some of whose entries are negative. 
(b) Give an example of a non-positive matrix all of whose entries are positive. 


6. A necessary and sufficient condition that a two-by-two matrix C A (con- 


sidered as a linear transformation on C?) be positive is that it be Hermitian sym- 
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metric (that is, that œ and ô be real and y = f) and that a= 0, 6 = 0, and ad — By 
2 0. 


7. Associated with each sequence (11, ---, zx) of k vectors in an inner product 
space there is a k-by-k matrix (not a linear transformation) called the Gramtan of 
(tu +++, £4) and denoted by G(21,---, 2%); the element in the i-th row and j-th 
column of G(z;, «++, xx) is the inner product (2;, 2;). Prove that every Gramian is 
& positive matrix. 

8. If z and y are non-zero vectors (in a finite-dimensional inner product space), 
then a necessary and sufficient condition that there exist a positive transformation 
A such that Az = y is that (z, y) > 0. 


9. (a) If the matrices A = (5 ) and B = (3 are considered as linear 


transformations on @?, and if C is a Hermitian matrix (linear transformation on 
@?) such that A £ C and B g OC, then 


ite 9 
a ( 6 1+ i)" 
where e and 6 are positive real numbers and | 6 |? S min {e(l + ô), &(1 + 6}. 


(b) If, moreover, C $1, then e = 6 =6@=0. In modern terminology these 
facts together show that Hermitian matrices with the ordering induced by the no- 


tion of positiveness do not form a lattice. In the real case, if the matrix 4 P) is 


interpreted as the point (a, 8, y) in three-dimensional space, the ordering and its 
non-lattice character take on an amusing geometric aspect. 


§ 73. Isometries 


We continue with our program of investigating the analogy between 
numbers and transformations. When does a complex number ¢ have abso- 
lute value one? Clearly a necessary and sufficient condition is that — = 
1/f; guided by our heuristic principle, we are led to consider linear trans- 
formations U for which U* = U™, or, equivalently, for which UU* = 
U*U = 1. (We observe that on a finite-dimensional vector space either 
of the two conditions UU* = 1 and U*U = 1 implies the other; see § 36, 
Theorems 1 and 2.) Such transformations are called orthogonal or unitary 
according as the underlying inner product space is real or complex. We 
proceed to derive a couple of useful alternative characterizations of them. 


THEOREM. The following three conditions on a linear transformation U on 
an inner product space are equivalent to each other. 


(1) U*U =1, 
(2) (Ux, Uy) = (z, y) for all x and y, 
(3) || Ux || = || x || for all z. 
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PROOF. If (1) holds, then 
(Ux, Uy) = (U*Uz, y) = (z, y) 
for all z and y, and, in particular, 
{| Uz |? = lx? 


for all x; this proves both the implications (1) = (2) and (2) = (8). The 
proof can be completed by showing that (3) implies (1). If (3) holds, that 
is, if (U*Uz, x) = (x, x) for all z, then § 71, Theorem 2 is applicable to the 
(self-adjoint) transformation U*U — 1; the conclusion is that U*U = 1 
(as desired). 

Since (3) implies that 


(4) | Uz — Uy} =||z—-yll 


for all z and y (the converse implication (4) => (8) is also true and trivial), 
we see that transformations of the type that the theorem deals with are 
characterized by the fact that they preserve distances. For this reason we 
shall call such a transformation an isometry. Since, as we have already 
remarked, an isometry on a finite-dimensional space is necessarily orthog- 
onal or unitary (according as the space is real or complex), use of this 
terminology will enable us to treat the real and the complex cases simulta- 
neously. We observe that (on a finite-dimensional space) an isometry is 
always invertible and that U~! (=U*) is an isometry along with U. 

In any algebraic system, and in particular in general vector spaces and 
inner product spaces, it is of interest to consider the automorphisms of the 
system, that is, to consider those one-to-one mappings of the system onto 
itself that preserve all the structural relations among its elements. We 
have already seen that the automorphisms of a general vector space are 
the invertible linear transformations. In an inner product space we re- 
quire more of an automorphism, namely, that it also preserve inner prod- 
ucts (and consequently lengths and distances). The preceding theorem 
shows that this requirement is equivalent to the condition that the trans- 
formation be an isometry. (We are assuming finite-dimensionality here; 
on infinite-dimensional spaces the range of an isometry need not be the 
entire space. This unimportant sacrifice in generality is for the sake of 
terminological convenience; for infinite-dimensional spaces there is no com- 
monly used word that describes orthogonal and unitary transformations 
simultaneously.) Thus the two questions “What linear transformations 
are the analogues of complex numbers of absolute value one?” and “What 
are the most general automorphisms of a finite-dimensional inner product 
space?” have the same answer: isometries. In the next section we shall 
show that isometries also furnish the answer to a third important question. 
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§ 74. Change of orthonormal basis 


We have seen that the theory of the passage from one linear basis of a 
vector space to another is best studied by means of an associated linear 
transformation A (§§ 46, 47); the question arises as to what special proper- 
ties A has when we pass from one orthonormal basis of an inner product 
space to another. The answer is easy. 


THEOREM 1. If © = {%1, ++, Xn} is an orthonormal basis of an n-dimen- 
sional inner product space U, and if U is an isometry on U, then UX = 
{Uzx,, +++, Ux} is also an orthonormal basis of U. Conversely, if U is a 
linear transformation and X is an orthonormal basis with the property that 
Ux is also an orthonormal basis, then U is an isometry. 


PROOF. Since (Uzx;, Uz;) = (xi, zj) = 4,;, it follows that UX is an ortho- 
normal set along with X; it is complete if X is, since (x, Ux;) = 0 for? = 
1, ++., n implies that (U*x, z:) = 0 and hence that U*z =x =0. If, 
conversely, UX is a complete orthonormal set along with X, then we have 
(Ux, Uy) = (x, y) whenever x and y are in X, and it is clear that by lin- 
earity we obtain (Uz, Uy) = (z, y) for all z and y. 

We observe that the matrix (u,;) of an isometric transformation, with 
respect to an arbitrary orthonormal basis, satisfies the conditions 


Doe Teens = 5:5, 


and that, conversely, any such matrix, together with an orthonormal basis, 
defines an isometry. (Proof: U*U = 1. In the real case the bars may be 
omitted.) For brevity we shall say that a matrix satisfying these condi- 
tions is an isometric matriz. 

An interesting and easy consequence of our considerations concerning 
isometries is the following corollary of § 56, Theorem 1. 


THEOREM 2. If A isa linear transformation on a complex n-dimensional 
inner product space U, then there exists an orthonormal basis X in U such 
that the matrix [A; X] is triangular, or equivalently, if [A] is a matrix, then 
there exists an isometric matriz [U] such that [U]~[A][U] is triangular. 


PROOF. In § 56, in the derivation of Theorem 2 from Theorem 1, we 
constructed a (linear) basis X = {x1, +*+, tn} with the property that zı, 
-++, a; lie in 91; and span N; for j = 1, ---, n, and we showed that with 
respect to this basis the matrix of A is triangular. If we knew that this 
basis is also an orthonormal basis, we could apply Theorem 1 of the present 
section to obtain the desired result. If X is not an orthonormal basis, it is 
easy to make it into one; this is precisely what the Gram-Schmidt orthog- 
onalization process (§ 65) can do. Here we use a special property of the 
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Gram-Schmidt process, namely, that the j-th element of the orthonormal 
basis it constructs is a linear combination of x1, ---, z; and lies therefore 
in Myj. 
EXERCISES 
L 
1. If (Ax)(t) = z(—t) on © (with the inner product given by (z, y) = Í x(t)y() dt) 
is the linear transformation A isometric? Is it self-adjoint? bs 


2. For which values of a are the following matrices isometric? 


GI: TEE) 


3. Find a 3-by-3 isometric matrix whose first row is a multiple of (1, 1, 1). 


4. If a linear transformation has any two of the properties of being self-adjoint, 
isometric, or involutory, then it has the third. (Recall that an involution is a 
linear transformation A such that A? = 1.) 


5. If an isometric matrix is triangular, then it is diagonal. 


6. If (z1, +*+, 2) and (yi, --+, Yx) are two sequences of vectors in the same inner 
product space, then a necessary and sufficient condition that there exist an isometry 
U such that Uz; = y; i = 1, ---, k, is that (£1, *-*, 2) and (yi1, -+-, yx) have the 
same Gramian. 


7. The mapping ¢ > stl maps the imaginary axis in the complex plane once 


é—1 
around the unit circle, missing the point 1; the inverse mapping (from the circle 
minus a point to the imaginary axis) is given by the same formula. The transforma- 
tion analogues of these geometric facts are as follows. 

(a) If A is skew, then A — 1 is invertible. 

(b) If U=(A4+1)(A—1)7}, then U is isometric. (Hint: || (A + 1)y |? 
= || (A — Dy ||? for every y.) 

(e) U — 1 is invertible. 

(d) If U is isometric and U — 1 is invertible, and if A = (U + 1)(U — 1)-}, 
then A is skew. 

Each of A and U is known as the Cayley transform of the other. 


8. Suppose that U is a transformation (not assumed to be linear) that maps an 
inner product space U onto itself (that is, if z is in U, then Uz is in U, and if y is 
in o then y = Ux for some z in U), in such a way that (Uz, Uy) = (z, y) for all 
zandy. 

(a) Prove that U is one-to-one and that if the inverse transformation is denoted 
by U—, then (U—'z, U—y) = (z, y) and (Uz, y) = (z, U~y) for all x and y. 

(b) Prove that U is linear. (Hint: (z, U—'y) depends linearly on z.) 


9. A conjugation is a transformation J (not assumed to be linear) that maps a 
unitary space onto itself and is such that J? = 1 and (Jz, Jy) = (y, x) for all z and y. 

(a) Give an example of a conjugation. 

(b) Prove that (Jz, y) = (Jy, z). 

(c) Prove that J(z + y) = Jz + Jy. 

(d) Prove that J(ax) = @-Jz. 
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10. A linear transformation A is said to be real with respect to a conjugation 
J if AJ = JA. 

(a) Give an example of a Hermitian transformation that is not real, and give 
an example of a real transformation that is not Hermitian. 

(b) If A is real, then the spectrum of A is symmetric about the real axis. 

(c) If A is real, then so is A*. 


11. § 74, Theorem 2 shows that the triangular form can be achieved by an 
orthonormal basis; is the same thing true for the Jordan form? 


12. If tr A = 0, then there exists an isometric matrix U such that all the diagonal 
entries of [U]—"[A][U] are zero. (Hint: see § 56, Ex. 6.) 


§ 75. Perpendicular projections 


We are now in a position to fulfill our earlier promise to investigate the 
projections associated with the particular direct sum decompositions U = 
m @ m+. We shall call such a projection a perpendicular projection. 
Since M+ is uniquely determined by the subspace M, we need not specify 
both the direct summands associated with a projection if we already know 
that it is perpendicular. We shall call the (perpendicular) projection Æ on 
gv along M+ simply the projection on M and we shall write E = Py. 


THEOREM 1. A linear transformation E is a perpendicular projection if 
and only if E = E? = E*, Perpendicular projections are positive linear 
transformations and have the property that || Ex || < || x || for all z. 


PROOF. If Eis a perpendicular projection, then § 45, Theorem 1 and the 
theorem of § 20 show (after, of course, the usual replacements, such as m+ 
for M? and A* for A’) that E = E*. Conversely if E = E? = E*, then 
the idempotence of Æ assures us that E is the projection on & along ʻA, 
where, of course, R = R(E) and N = N(E) are the range and the null- 
space of E, respectively. Hence we need only show that R and % are or- 
thogonal. For this purpose let z be any element of & and y any element of 
N; the desired result follows from the relation 

(z, y) = (Ex, y) = (z, E*y) = (z, Ey) = 0. 
The positive character of an E satisfying E = E? = E* follows from 
(Ex, z) = (E?x, z) = (Ex, E*r) = (Ez, Ex) = || Ex ||? = 0. 
Applying this result to the perpendicular projection 1 — E, we see that 
Iz |? — | Ez |? = (æ, 2) — (Ez, 2) = (© — Ble, 2) 2 0; 


this concludes the proof of the theorem. 
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For some of the generalizations of our theory it is useful to know that 
idempotence together with the last property mentioned in Theorem 1 is 
also characteristic of perpendicular projections. 


THEOREM 2. If a linear transformation E is such that E = E? and 
|| Ex || < | x || for all x, then E = E*. 


PROOF. We are to show that the range ® and the null-space % of E are 
orthogonal. If x is in N+, then y = Ex — z is in X, since Ey = E*x — Ex 
= Ex — Ex =0. Hence Ex = x + y with (z, y) = 0, so that 


lal? = | Erl? = lel? + iy iP 2 aN, 


and therefore y = 0. Consequently Er = x, so that z is in ®; this proves 
that 1+ C Q. Conversely, if zis in Q, so that Ez = z, we writez = z + y 
with zin 0+ andyin®. Thenz = Ez = Ex + Ey = Ex = z. (The rea- 
son for the last equality is that z is in N+ and therefore in Q.) Hence z is 
in 9+, so that R C N+, and therefore R = NL. 

We shall need also the fact that the theorem of § 42 remains true if the 
word “projection” is qualified throughout by “perpendicular.” This is an 
immediate consequence of the preceding characterization of perpendicular 
projections and of the fact that sums and differences of self-adjoint trans- 
formations are self-adjoint, whereas the product of two self-adjoint trans- 
formations is self-adjoint if and only if they commute. By our present 
geometric methods it is also quite easy to generalize the part of the theorem 
dealing with sums from two summands to any finite number. The generali- 
zation is most conveniently stated in terms of the concept of orthogonality 
for projections; we shall say that two (perpendicular) projections E and F 
are orthogonal if EF = 0. (Consideration of adjoints shows that this is 
equivalent to FE = 0.) The following theorem shows that the geometric 
language is justified. 


THEOREM 3. Two perpendicular projections E = Py and F = Py are 
orthogonal if and only if the subspaces M and N (that is, the ranges of E 
and F) are orthogonal. 


PROOF. If EF = 0, and if x and y are in the ranges of E and F respec- 
tively, then 


(z, y) = (Ez, Fy) = (z, E*Fy) = (x, EFy) = 0. 
If, conversely, M and N are orthogonal (so that X C M+), then the fact 


that Ex = 0 for z in M+ implies that EFx = 0 for all x (since Fx is in N 
and consequently in M+). 
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§ 76. Combinations of perpendicular projections 


The sum theorem for perpendicular projections is now easy. 


THEOREM 1. If E, +++, En are (perpendicular) projections, then a neces- 
sary and sufficient condition that E = E, +--+ En be a (perpendicular) 
projection is that E;E; = 0 whenever i = j (that is, that the E; be pairwise 
orthogonal). 


PROOF. The proof of the sufficiency of the condition is trivial; we prove 
explicitly its necessity only, so that we now assume that E is a perpendicu- 
lar projection. If z belongs to the range of some F;, then 

lz |? = || Ez ||? = (Ez, x) = (QU; Ey, 2) 

= 2y (Ey, 2) = Di; || Eje |? 2 || Ea |? = |e IP, 
so that we must have equality all along. Since, in particular, we must have 
È; Il Ez |? = || Be |, 
it follows that E;z = 0 whenever j ~ i. In other words, every x in the 
range of E; is in the null-space (and, consequently, is orthogonal to the 
range) of every E; with j = i; using § 75, Theorem 3, we draw the desired 
conclusion. 

We end our discussion of projections with a brief study of order relations. 
It is tempting to write E < F, for two perpendicular projections E = Pay 
and F = Py, whenever M C R. Earlier, however, we interpreted the sign 
<, when used in an expression involving linear transformations E and F 
(as in E < F), to mean that F — E is a positive transformation. There 
are also other possible reasons for considering E to be smaller than F; we 
might have || Ex || < || Fz || for all z, or FE = EF = E (see § 42, (ii). 
The situation is straightened out by the following theorem, which plays 
here a role similar to that of § 75, Theorem 3, that is, it establishes the coin- 
cidence of several seemingly different concepts concerning projections, some 
of which are defined algebraically while others refer to the underlying geo- 
metrical objects. 


THEOREM 2. For perpendicular projections E = Pa and F = Py the fol- 
lowing conditions are mutually equivalent. 


G) EsF. 
(ii) |] Ez || < || Fz |] for all x. 
(iii) MEN. 
(iva) FE = E, 


(ivb) EF = E. 
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Proor. We shall prove the implication relations (i) = (ii) = (iii) = 
(iva) = (ivb) = @). 
(i) = Gi). If E < F, then, for all z, 


0 < (IF — Ejer, z) = (Fz, z) — (Ez, z) = || Fe |? — || Ex ||? 


(since E and F are perpendicular projections). 
(ii) = (iti). We assume that || Ez || < || Fx || for all z. Let us now 
take any z in M; then we have 


lell 2 || Fe || 2 || Ex l| = (le, 
so that || Fz || = || z ||, or (z, z) — (Fz, x) = 0, whence 
({1 — Fz, x) = ii (1 — F)z ||? = 0, 


and consequently z = Fr. In other words, z in M implies that x is in %, 
as was to be proved. 

(iii) = (iva). £M C 0, then Ez is in % for all x, so that, FEx = Ex 
for all x, as was to be proved. 

That (iva) implies (ivb), and is in fact equivalent to it, follows by taking 
adjoints. 

(iv) = (i). If EF = FE = E, then, for all z, 


(Fa, x) — (Ex, 2) = (Fx, x) — (FEx, x) = (F[1 — E]z, 2). 


Since E and F are commutative projections, so also are (1 — E) and F, 
and consequently G = F(1 — E) isa projection. Hence 


(Fz, 2) — (Ez, 2) = (Gz, z) = || Ga |? 20. 


This completes the proof of Theorem 2. 

In terms of the concepts introduced by now, it is possible to give a quite 
intuitive sounding formulation of the theorem of § 42 (in so far as it applies 
to perpendicular projections), as follows. For two perpendicular projec- 
tions E and F, their sum, product, or difference is also a perpendicular 
projection if and only if F is respectively orthogonal to, commutative with, 
or greater than Æ. 


EXERCISES 


1. (a) Give an example of a projection that is not a perpendicular projection. 
(b) Give an example of two projections Æ and F (they cannot both be per- 
pendicular) such that EF = 0 and FE = 0. 


2. Find the (perpendicular) projection of (1, 1, 1) on the (one-dimensional) sub- 
space of C* spanned by (1, —1, 1). (In other words: find the image of the given 
vector under the projection onto the given subspace.) 


3. Find the matrices of all perpendicular projections on C?, 
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4. If U = 2E — 1, then a necessary and sufficient condition that U be an in- 
volutory isometry is that # be a perpendicular projection. 

5. A linear transformation U is called a partial tsometry if there exists a sub- 
space M such that || Uz || = || x || whenever z is in M and Uz = 0 whenever 
z isin I~. 

(a) The adjoint of a partial isometry is a partial isometry. 

(b) If Visa partial isometry and if WM is a subspace such that || Uz || = |[ z |l 
or 0 according as z is in M or in MF, then U*U is the perpendicular projection on 
mM. 


(c) Each of the following four conditions is necessary and sufficient that a linear 
transformation U be a partial isometry. (i) UU*U = U, (ii) U*U is a projection, 
(ii) U*UU* = U*, (iv) UU* is a projection. 

(d) If à is a proper value of a partial isometry, then |A] <£ 1. 

(e) Give an example of a partial isometry that has 4 as a proper value. 


6. Suppose that A is a linear transformation on, and W is a subspace of, a finite- 
dimensional vector space U. Prove that if dim M < dim M+, then there exist 
linear transformations B and C on U such that Ar = (BC — CB)z for all x in 
m. (Hint: let B be a partial isometry such that {| Bz || = || z || or O according as 
z is in M or in M+ and such that Q(B) C MH.) 


§ 77. Complexification 


In the past few sections we have been treating real and complex vector 
spaces simultaneously. Sometimes this is not possible; the complex num- 
ber system is richer than the real. There are theorems that are true for 
both real and complex spaces, but for which the proof is much easier in 
the complex case, and there are theorems that are true for complex spaces 
but not for real ones. (An example of the latter kind is the assertion that 
if the space is finite-dimensional; then every linear transformation has a 
proper value.) For these reasons, it is frequently handy to be able to 
“complexify”’ a real vector space, that is, to associate with it a complex 
vector space with essentially the same properties. The purpose of this 
section is to describe such a process of complexification. 

Suppose that V is a real vector space, and let UY be the set of all ordered 
pairs (x, y} with both x and y in U. Define the sum of two elements of 
ut by 

(z1, yi) + (T2, Y2) = (£1 + 22, Y1 + Y2), 


and define the product of an element of U+ by a complex number a + 78 
(a and @ real, 1 = V —1) by 
(a + iB) (x, y) = (ax — By, Bx + ay). 


(To remember these formulas, pretend that (z, y) means x + ty.) A 
straightforward and only slightly laborious computation shows that the 
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set Ut becomes a complex vector space with respect to these definitions 
of the linear operations. 

The set of those elements (z, y} of UY for which y = 0 is in a natural 
one-to-one correspondence with the space U. Being a complex vector 
space, the space Ut may also be regarded as a real vector space; if we 
identify each element z of V with its replica (x, 0) in UY (it is exceedingly 
convenient to do this), we may say that Ut (as a real vector space) in- 
cludes U. Since (0, y) = ily, 0), so that (x, y) = (x, 0) + ily, 0), our 
identification convention enables us to say that every vector in Ut has 
the form z + iy, with z and yin U. Since VU and 70 (where tU denotes the 
set of all elements (z, y) in UY with z = 0) are subsets of Ut with only 
O (that is, (0, 0)) in common, it follows that the representation of a vector 
of Ut in the form z + zy (with z and y in U) is unique. We have thus 
constructed a complex vector space Ut with the property that Ut con- 
sidered as a real space includes U as a subspace, and such that Ut is the 
direct sum of © and iU. (Here 7U denotes the set of all those elements 
of Ut that have the form ty for some y in U.) We shall call Ut the com- 
plexification of U. 

If {zı, +++, Zn} is a linearly independent set in U (real coefficients), 
then it is also a linearly independent set in UT (complex coefficients). In- 
deed, if a1, «++, on, B1, -**, Bn are real numbers such that 50; (a; + #8,)z; 
=0, then (>>; a;2;) + i()-;8;2;) = 0, and consequently, by the 
uniqueness of the representation of vectors in Ut by means of vectors in 
vV, it follows that >>; ajz; = )_; 8;2; = 0; the desired result is now implied 
by the assumed (real) linear independence of {x1, ---,2%,} in U. If, more- 
over, {21, ---, Zn} is a basis in V (real coefficients), then it is also a basis 
in Ut (complex coefficients). Indeed, if z and y are in U, then there exist 
real numbers a1, -*-, an, Bi, +*+, Bn such that z = Dj a,;z; and y = 
>. ; 8;2;; it follows that z + ty = >); (a; + 78;)z;, and hence that {z1, ---, 
£a} spans Ut. These results imply that the complex vector space ut 
has the same dimension as the real vector space U. 

There is a natural way to extend every linear transformation A on U 
to a linear transformation A* on Ut; we write 


At(z + ty) = Az + iAy 


whenever x and y are in U. (The verification that At is indeed a linear 
transformation on Ut is routine.) A similar extension works for linear and 
even multilinear functionals. If, for instance, w is a (real) bilinear func- 
tional on 0, its extension to Ut is the (complex) bilinear functional defined 
by 


wt (a + iyi, 23 + iya) 
= w(x1, z3) — wlys, Y2) + i(w(zt1, ya) + wly1, z2)). 
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If, on the other hand, w is alternating, then the same is true of wt. Indeed, 
the real and imaginary parts of wt (z + iy, x + iy) are w(x, z) — w(y, i) 
and w(z, y) + w(y, x) respectively; if w is alternating, then w is skew sym- 
metric (§ 30, Theorem 1), and therefore wt is alternating. The same proof 
establishes the corresponding result for k-linear functionals also, for all 
values of k. From this and from the definition of determinants it follows 
that det A = det At for every linear transformation A on V. 

The method of extending bilinear functionals works for conjugate bi- 
linear functionals also. If, that is, U is a (real) inner product space, then 
there is a natural way of introducing a (complex) inner product into Ut; 
we write, by definition, 


(ay + iyi, 2 + tye) = (a1, £2) + (Y1, Yo) — 11, Y2) — (Yi T2)). 
Observe that if x and y are orthogonal vectors in U, then 


le + iyl? =] 21? + iy. 


The correspondence from A to At preserves all algebraic properties of 
transformations. Thus if B = aA (with a real), then Bt = aAt; if 
C = A + B, then Ct = At + Bt; and if C = AB, then C = ATBT. 
If, moreover, VU is an inner product space, and if B = A*, then Bt = (At)*. 
(Proof: evaluate (A +(x, + iyı), (£2 + iy2)) and (zı + iyı, B* (x2 + iy2)).) 

If A is a linear transformation on U and if At has a proper vector 
z + iy, with proper value a + 7@ (where x and y are in U and a and £ are 
real), so that 

Az = az — By, 


Ay = Br + ay, 


then the subspace of U spanned by z and y is invariant under A. (Since 
every linear transformation on a complex vector space has a proper vector, 
we conclude that every linear transformation on a real vector space leaves 
invariant a subspace of dimension equal to 1 or 2.) If, in particular, At 
happens to have a real proper value (that is, if 8 = 0), then A has the same 
proper value (since Ax = az, Ay = ay, and not both z and y can vanish). 

We have already seen that every (real) basis in U is at the same time 
a (complex) basis in UF. It follows that the matrix of a linear transforma- 
tion A on U, with respect to some basis X in U, is the same as the matrix 
of At on vt, with respect to the basis X in vt. This comment is at the 
root of the whole theory of complexification; the naive point of view on the 
matter is that real matrices constitute a special case of complex matrices. 
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EXERCISES 


1. What happens if the process of complexification described in § 77 is applied 
to a vector space that is already complex? 


2. Prove that there exists a unique isomorphism between the complexification 
described in § 77 and the one described in § 25, Ex. 5 with the property that each 
“Teal” vector (that is, each vector in the originally given real vector space) cor- 
responds to itself. 


3. (a) What is the complexification of Q!? 
(b) If © is an n-dimensional real vector space, what is the dimension of its 
complexification U+, regarded as a real vector space? 


4. Suppose that U+ is the complex inner product space obtained by complexifying 
a real inner product space VU. 

(a) Prove that if U+ is regarded as a real vector space and if A(z + ty) = x — ty 
whenever z and y are in U, then A is a linear transformation on Ut. 

(b) Is A self-adjoint? Isometric? Idempotent? Involutory? 

(c) What if U+ is regarded as a complex space? 


5. Discuss the relation between duality and complexification, and, in particular, 
the relation between the adjoint of a linear transformation on a real vector space 
and the adjoint of its complexification. 


6. If A is a linear transformation on a real vector space Y and if a subspace M 
of the complexification Ut is invariant under A +, then m+ N V is invariant under 


§ 78. Characterization of spectra 


The following results support the analogy between numbers and trans- 
formations more than anything so far; they assert that the properties that 
caused us to define the special classes of transformations we have been 
considering are reflected by their spectra. 


TuHEoreM 1. If A is a self-adjoint transformation on an inner product 
space, then every proper value of A is real; if A is positive, or strictly positive, 
then every proper value of A is positive, or strictly positive, respectively. 


PROOF. We may ignore the fact that the first assertion is trivial in the 
real case; the same proof serves to establish both assertions in both the 
real and the complex case. Indeed, if Ax = Ar, with x Æ 0, then, 


(Az, x) _ A(z, x) _ 


iz? zr? 


it follows that if (Az, x) is real (see § 71, Theorem 4), then so is A, and if 
(4x, z) is positive (or strictly positive) then so is X. 
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THEOREM 2. Every root of the characteristic equation of a self-adjoint 
transformation on a finite-dimensional inner product space is real. 


PROOF. In the complex case roots of the characteristic equation are the 
same thing as proper values, and the result follows from Theorem 1. 
If A is a symmetric transformation on a Euclidean space, then its com- 
plexification A+ is Hermitian, and the result follows from the fact that 
A and A* have the same characteristic equation. 

We observe that it is an immediate consequence of Theorem 2 that a 
self-adjoint transformation on a finite-dimensional inner product space 
always has a proper value. 


THEOREM 3. Every proper value of an isometry has absolute value one. 


PROOF. If U is an isometry, and if Ux = Az, with x = 0, then ||z || 
= || Uz || = [A]; Iæ l. 


THEOREM 4. If A is either self-adjoint or isometric, then proper vectors 
of A belonging to distinct proper values are orthogonal. 


PROOF. Suppose Az; = M21, Avg = AgTe, M ¥ Ag. If A is self-adjoint, 
then 


(1) Ai(@1, £2) = (Azı, T2) = (zı, Az2) = M(t, 22). 


(The middle step makes use of the self-adjoint character of A, and the last 
step of the reality of àz.) In case A is an isometry, (1) is replaced by 


(2) (£1, £2) = (Axy, Ate) = (M/`2) (£1, z2); 


recall that ìs = 1/^z. In either case (zı, z2) Æ 0 would imply that A; 
= àg, so that we must have (z1, z2) = 0. 


THEOREM 5. If a subspace M is invariant under an isometry U on a 
finite-dimensional inner product space, then so is M+. 


PROOF. Considered on the finite-dimensional subspace M, the trans- 
formation U is still an isometry, and, consequently, it is invertible. It 
follows that every x in M may be written in the form z = Uy with y 
in M; in other words, if x is in M and if y = U~'z, then y isin M. Hence 
w is invariant under U~! = U*. It follows from § 45, Theorem 2, that 
m+ is invariant under (U*)* = U. 

We observe that the same result for self-adjoint transformations (even 
in not necessarily finite-dimensional spaces) is trivial, since if M is invariant 
under A, then M+ is invariant under A* = A. 


Tarorem 6. If A is a self-adjoint transformation on a finite-dimensional 
inner product space, then the algebraic multiplicity of each proper value 
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Xo of A is equal to its geometric multiplicity, that is, to the dimension of the 
subspace M of all solutions of Ax = dor. 


PROOF. It is clear that M is invariant under A, and therefore so is m+; 
let us denote by B and C the linear transformation A considered only on 
M and M+ respectively. We have 


det (A — A) = det (B — A)-det (C — d) 


for all A. Since B is a self-adjoint transformation on a finite-dimensional 
space, with only one proper value, namely, Xo, it follows that A» must 
occur as a proper value of B with algebraic multiplicity equal to the di- 
mension of M. If that dimension is m, then det (B — A) = (Ap — A)”. 
Since, on the other hand, Xo is not a proper value of C at all, and since, 
consequently, det (C — A») = 0, we see that det (A — A) contains (ào — A) 
as a factor exactly m times, as was to be proved. 

What made this proof work was the invariance of s+ and the fact 
that every root of the characteristic equation of A is a proper value of A. 
The latter assertion is true for every linear transformation on a unitary 
space; the following result is a consequence of these observations and of 
Theorem 5. 


TuHeorEM 7. If U is a unitary transformation on a finite-dimensional 
unitary space, then the algebraic multiplicity of each proper value of U is 
equal to its geometric multiplicity. 


EXERCISES 


1. Give an example of a linear transformation with two non-orthogonal proper 
vectors belonging to distinct proper values. 


2. Give an example of a non-positive linear transformation (on a finite-di- 
mensional unitary space) all of whose proper values are positive. 


3. (a) If A is self-adjoint, then det A is real. 
(b) If A is unitary, then [det A| = 1. 
(c) What can be said about the determinant of a partial isometry? 


§ 79. Spectral theorem 


We are now ready to prove the main theorem of this book, the theorem 
of which many of the other results of this chapter are immediate corollaries. 
To some extent what we have been doing up to now was a matter of 
sport (useful, however, for generalizations); we wanted to show how much 
can conveniently be done with spectral theory before proving the spectral 
theorem. In the complex case, incidentally, the spectral theorem can be 
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made to follow from the triangularization process we have already described; 
because of the importance of the theorem we prefer to give below its 
(quite easy) direct proof. The reader may find it profitable to adapt the 
method of proof (not the result) of § 56, Theorem 2, to prove as much as 
he can of the spectral theorem and its consequences. 


Turorem 1. To every self-adjoint linear transformation A on a finite- 
dimensional inner product space there correspond real numbers a, «++, ar 
and perpendicular projections E,, ---, E, (where r is a strictly positive 
integer, not greater than the dimension of the space) so that 


(1) the a; are pairwise distinct, 

(2) the E; are pairwise orthogonal and different from 0, 
3) D5 F; = 1, 

(4) Du; aj; = A. 


PROOF. Let a1, --*, a, be the distinct proper values of A, and let Ej 
be the perpendicular projection on the subspace consisting of all solutions 
of Ax = ajx (j = 1, ---, r). Condition (1) is then satisfied by definition; 
the fact that the a’s are real follows from § 78, Theorem 1. Condition (2) 
follows from § 78, Theorem 4. From the orthogonality of the E; we infer 
that if E = >>, E; then E is a perpendicular projection. The dimension 
of the range of E is the sum of the dimensions of the ranges of the E;, 
and consequently, by § 78, Theorem 6, the dimension of the range of E 
is equal to the dimension of the entire space; this implies (3). (Alterna- 
tively, if E = 1, then A considered on the range of 1 — E would be a self- 
adjoint transformation with no proper values.) To prove (4), take any 
vector z and write z; = E;z; it follows that Az; = a;z; and hence that 


Az = A(D; Eje) = 0; Az; = DO; ajz; = DO; aE. 


This completes the proof of the spectral theorem. 

The representation A = J; a,;E; (where the o’s and the E’s satisfy the 
conditions (1)-(3) of Theorem 1) is called a spectral form of A; the main 
effect of the following result is to prove the uniqueness of the spectral form. 


TurorEM 2. If > 5.1 œE; ts the spectral form of a self-adjoint transforma- 
tion A on a finite-dimensional inner product space, then the a’s are all the 
distinct proper values of A. If, moreover, 1S k S r, then there exist 
polynomials pr, with real coefficients, such that py(a;) = O whenever j # k 
and such that p;(ax) = 1; for every such polynomial py(A) = Ex. 


PROOF. Since E; + 0, there exists a vector x in the range of Ej. Since 
Ejx = x and E;z = 0 whenever i + j, it follows that 


Az = Ji; a;E;z = a;E jx = at, 
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so that each a; is a proper value of A. If, conversely, \ is any proper value 

of A, say Ar = Ax with x # 0, then we write x; = Ejx and we see that 
At=N= Dy Ti 


Az = A |t = Bajaji, 
so that >>; (A — a,;)2; = 0. Since the z; are pairwise orthogonal, those 
among them that are not zero form a linearly independent set. It follows 
that, for each j, either z; = 0 or else A = aj. Since z ~ 0, we must have 
x; =Æ 0 for some j, and consequently à is indeed equal to one of the a’s. 
Since E,E; = 0 if i # j, and E; = E; it follows that 
A? = (D0; aE) (Do; E) = Dos Duy esayE EB; 


= Diy a; E;. 
A" = Jj aj"E; 
for every positive integer n (in case n = 0, use (3)), and hence 
p(A) = 20; pla;)E; 
for every polynomial p. To conclude the proof of the theorem, all we need 


to do is to exhibit a (real) polynomial p such that py(a;) = 0 whenever 
j Æ k and such that p,(a;) = 1. If we write 


and 


Similarly 


t— aj 


p(t) = Ilie 


ak — a; 
then pz is a polynomial with all the required properties. 


THEOREM 3. If >-'.10;E; is the spectral form of a self-adjoint trans- 
formation A on a finite-dimensional inner product space, then a necessary 
and sufficient condition that a linear transformation B commute with A 
is that it commute with each E;. 


PROOF. The sufficiency of the condition is trivial; if A = >; a;E; and 
E;B = BE; for all j, then AB = BA. Necessity follows from Theorem 2; 
if B commutes with A, then B commutes with every polynomial in A, and 
therefore B commutes with each Ej. 

Before exploiting the spectral theorem any further, we remark on its 
matricial interpretation. If we choose an orthonormal basis in the range 
of each E;, then the totality of the vectors in these little bases is a basis 
for the whole space; expressed in this basis the matrix of A will be diagonal. 
The fact that by a suitable choice of an orthonormal basis the matrix 
of a self-adjoint transformation can be made diagonal, or, equivalently, 
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that any self-adjoint matrix can be isometrically transformed (that is, 
replaced by [U]~{A][U], where U is an isometry) into a diagonal matrix, 
already follows (in the complex case) from the theory of the triangular 
form. We gave the algebraic version for two reasons. First, it is this 
version that generalizes easily to the infinite-dimensional case, and, 
second, even in the finite-dimensional case, writing >; a;H; often has 
great notational and typographical advantages over the matrix notation. 

We shall make use of the fact that a not necessarily self-adjoint trans- 
formation A is isometrically diagonable (that is, that its matrix with respect 
to a suitable orthonormal basis is diagonal) if and only if conditions (1)-(4) 
of Theorem 1 hold for it. Indeed, if we have (1)—(4), then the proof of 
diagonability, given for self-adjoint transformations, applies; the converse 
we leave as an exercise for the reader. 


EXERCISES 


1. Suppose that A is a linear transformation on a complex inner product space. 
Prove that if A is Hermitian, then the linear factors of the minimal polynomial of 
A are distinct. Is the converse true? 


2. (a) Two linear transformations A and B on a unitary space are unilarily 
equivalent if there exists a unitary transformation U such that A = U—'BU. 
(The corresponding concept in the real case is called orthogonal equivalence.) Prove 
that unitary equivalence is an equivalence relation. 

(b) Are A*A and AA* always unitarily equivalent? 

(c) Are A and A* always unitarily equivalent? 


3. Which of the following pairs of matrices are unitarily equivalent? 


1 1 0 0 
(a) G J and & o) 
001 4 2 
(b) ( 0 0) and ( 3 
100 0 0 
0 0 -1 0 0 
(e) (-1 0) and ( 3 o); 
0 0 4 


0 0 0 1 0 
(d) (- 1 o) and ( 0 0) . 
00 -1 00 1 


4. If two linear transformations are unitarily equivalent, then they are similar, 
and they are congruent; if two linear transformations are either similar or con- 
gruent, then they are equivalent. Show by examples that these implication rela- 
tions are the only ones that hold among these concepts. 


or o oom 
I 
ry 
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§ 80. Normal transformations 


The easiest (and at the same time the most useful) generalizations of 
the spectral theorem apply to complex inner product spaces (that is, 
unitary spaces). In order to avoid irrelevant complications, in this section 
we exclude the real case and concentrate attention on unitary spaces 
only. 

We have seen that every Hermitian transformation is diagonable, and 
that an arbitrary transformation A may be written in the form B + i0, 
with B and C Hermitian; why isn’t it true that simply by diagonalizing 
B and C separately we can diagonalize A? The answer is, of course, that 
diagonalization involves the choice of a suitable orthonormal basis, and 
there is no reason to expect that a basis that diagonalizes B will have the 
same effect on C. It is of considerable importance to know the precise 
class of transformations for which the spectral theorem is valid, and 
fortunately this class is easy to describe. 

We shall call a linear transformation A normal if it commutes with its 
adjoint, A*A = AA*. (This definition makes sense, and is used, in both 
real and complex inner product spaces; we shall, however, continue to use 
techniques that are inextricably tied up with the complex case.) We 
point out first that A is normal if and only if its real and imaginary parts 
commute. Suppose, indeed, that A is normal and that A = B + iC with 


1 i 
B and C Hermitian; since B = 3(A + A*) and C = zá — A*), it is 


clear that BC = CB. If, conversely, BC = CB, then the two relations 
A = B + iC and A* = B — iC imply that A is normal. We observe that 
Hermitian and unitary transformations are normal. 

The class of transformations possessing a spectral form in the sense of 
§ 79 is precisely the class of normal transformations. Half of this statement 
is easy to prove: if A = +; a;E;, then A* = >>; a;jE;, and it takes merely 
a simple computation to show that A*A = AA* = >; |a;|?E;. To prove 
the converse, that is, to prove that normality implies the existence of a 
spectral form, we have two alternatives. We could derive this result from 
the spectral theorem for Hermitian transformations, using the real and 
imaginary parts, or we could prove that the essential lemmas of § 78, on 
which the proof of the Hermitian case rests, are just as valid for an arbitrary 
normal transformation. Because its methods are of some interest, we 
adopt the second procedure. We observe that the machinery needed to 
prove the lemmas that follow was available to us in § 78, so that we 
could have stated the spectral theorem for normal transformations im- 
mediately; the main reason we traveled the present course was to motivate 
the definition of normality. 
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THEOREM 1. If A is normal, then a necessary and sufficient condition that 
x be a proper vector of A is that it be a proper vector of A*; if Ax = dz, 


then A*x = dz. 
PROOF. We observe that the normality of A implies that 
|| Ax |? = (Az, Ax) = (A*Az, x) = (AA*z, 2) 
= (A*r, A*z) = || A*z ||’. 


Since A — d is normal along with A, and since (A — A)* = A* — i, we 
obtain the relation 
| Az — dx || = || A*z — dz ||, 


from which the assertions of the theorem follow immediately. 


THEOREM 2. If A is normal, then proper vectors belonging to distinct 
proper values are orthogonal. 


PROOF. If Azı = Mf and Axe = A2Ze, then 
Ai (21, £2) = (Ary, z2) = (z1, A*z2) = M(31, z2). 


This theorem generalizes § 78, Theorem 4; in the proof of the spectral 
theorem for Hermitian transformations we needed also § 78, Theorems 
5 and 6. The following result takes the place of the first of these. 


THEOREM 3. If A is normal, \ is a proper value of A, and W is the set 
of all solutions of Ax = dz, then both M and M+ are invariant under A. 


PROOF. The fact that M is invariant under A we have seen before; 
this has nothing to do with normality. To prove that m+ is also invariant 
under A, it is sufficient to prove that 91 is invariant under A*. This is 
easy; if x is in M, then 


A(A*z) = A*(Az) = MA*z), 


so that A*z is also in W. 

This theorem is much weaker than its correspondent in § 78. The im- 
portant thing to observe, however, is that the proof of § 78, Theorem 6, 
depended only on the correspondingly weakened version of Theorem 5; 
the only subspaces that need to be considered are the ones of the type 
mentioned in the preceding theorem. 

This concludes the spade work; the spectral theorem for normal operators 
follows just as before in the Hermitian case. If in the theorems of § 79 
we replace the word “self-adjoint” by “normal,” delete all references to 
reality, and insist that the underlying inner product space be complex, 
the remaining parts of the statements and all the proofs remain unchanged. 

It is the theory of normal transformations that is of chief interest in the 
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study of unitary spaces. One of the most useful facts about normal trans- 
formations is that spectral conditions of the type given in § 78, Theorems 
1 and 3, there shown to be necessary for the self-adjoint, positive, and 
isometric character of a transformation, are in the normal case also suf- 
ficient. 


THEOREM 4. A normal transformation on a finite-dimensional unitary 
space is (1) Hermitian, (2) positive, (3) strictly positive, (4) unitary, (5) 
invertible, (6) idempotent if and only if all its proper values are (1') real, 
(2') positive, (3') strictly positive, (4’) of absolute value one, (5’) different 
from zero, (6’) equal to zero or one. 


proor. The fact that (1), (2), (8), and (4) imply (1°), (2’), (8°), and 
(4'), respectively, follows from §78. If A is invertible and Az = dz, 
with z ~ 0, then x = A~'Az = \A~'z, and therefore à # 0; this proves 
that (5) implies (5’). If A is idempotent and Az = dz, with z Æ 0, then 
Az = Ar = A®x = dx, so that (A — A*)z = 0 and therefore A = A?; this 
proves that (6) implies (6’). Observe that these proofs are valid for an 
arbitrary inner product space (not even necessarily finite-dimensional) 
and that the auxiliary assumption that A is normal is also superfluous. 

Suppose now that the spectral form of A is >; a;E;. Since A* = 
>; &Ej, we see that (1’) implies (t). Since 


(Az, £) = Dj a;(Eyx, £) = QU; o; || Eye |, 
it follows that (2’) implies (2). If a; > 0 for all j and if (Az, z) = 0, 
then we must have E;z = 0 for all j, and therefore x = 0; Ejs = 0; 
this proves that (3’) implies (3). The implication from (4’) to (4) follows 
from the relation 
A*A = D;la;| "E; 
i 1 

If a; = 0 for all j, we may form the linear transformation B = > gE” 


3 
since AB = BA = 1, it follows that (5’) implies (5). Finally A? = 
$j a;ŻE;; from this we infer that (6’) implies (6). 

We observe that the implication relations (5) = (5°), (2) = (2’), and 
(3’) = (8) together fulfill a promise we made in § 72; if A is positive and 
invertible, then A is strictly positive. 


EXERCISES 


1. Give an example of a normal transformation that is neither Hermitian nor 
unitary, 
2. (a) If A is an arbitrary linear transformation (on a finite-dimensional unitary 


space), and if œ and 8 are complex numbers such that |a| = |8| = 1, then aA + 
8A* is normal, 
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(b) If || Az || = || A*z || for all x, then A is normal. 
(c) Is the sum of two normal transformations always normal? 


3. If A is a normal transformation on a finite-dimensional unitary space and if 
Jri is a subspace invariant under A, then the restriction of A to M is also normal. 


4. A linear transformation A on a finite-dimensional unitary space U is normal 
if and only if AM C M implies Amt C M+ for every subspace M of V. 


5. (a) If A is normal and idempotent, then it is self-adjoint. 

(b) If A is normal and nilpotent, then it is zero. 

(c) If A is normal and A? = A?, then A is idempotent. Does the conclusion 
remain true if the assumption of normality i is omitted? 

@ If A is self-adjoint and if A* = 1 for some strictly positive integer ķ, then 
At=1, 


6. If A and B are normal and if AB = 0, does it follow that BA = 0? 


7. Suppose that A is a linear transformation on an n-dimensional unitary space; 
let Ay, -+ *, Àn be the proper values of A (each occurring a number of times equal to 
its algebraic multiplicity). Prove that 


DislAs|? < tr (A*A), 
and that A is normal if and only if equality holds. 


8. The numerical range of a linear transformation A on a finite-dimensional 
Tete space is the set W(A) of all complex numbers of the form (Az, x), with 
lz] =1. 

(a) If A is normal, then W(A) is convex. (This means that if £ and y are in 
W(A) and if0Sas 1, then af -++ (1 — a)n is also in W(A).) 

(b) If A is normal, then every extreme point of W(A) is a proper value of A. 
(An extreme point is one that does not have the form af + (1 — @)y for any & 
and n in W(A) and for any a properly between 0 and 1.) 

(c) It is known that the conclusion of (a) remains true even if normality is not 
assumed. This fact can be phrased as follows: if A: and Az are Hermitian trans- 
formations, then the set of all points of the form ((A1z, x), (Ax, x)) in the real 
coordinate plane (with || x || = 1) is convex. Show that the generalization of this 
assertion to more than two Hermitian transformations is false. 

(d) Prove that the conclusion of (b) may be false for non-normal transformations. 


§ 81. Orthogonal transformations 


Since a unitary transformation on a unitary space is normal, the results 
of the preceding section include the theory of unitary transformations as 
a special case. Since, however, an orthogonal transformation on a real 
inner product space need not have any proper values, the spectral theorem, 
as we know it so far, gives us no information about orthogonal transforma- 
tions. It is not difficult to get at the facts; the theory of complexification 
was made to order for this purpose. 

Suppose that U is an orthogonal transformation on a finite-dimensional 
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real inner product space U; let UY be the extension of U to the complexifica- 
tion Ut. Since U*U = 1 (on V), it follows that (Ut)*Ut = 1 (on U7), 
that is, that UF is unitary. 

Let à = a + 78 be a complex number («æ and £ real), and let M be the 
subspace consisting of all solutions of Utz = Az in Ut. (If à is not a 
proper value of Ut, then M = 0.) If z is in ©, write z = x + iy, with 
x and yin U. The equation 


Ux + iUy = (a + 8) (a + ty) 
implies (cf. § 77) that 
Ux = ax — By 
and 
Uy = Br + ay. 


If we multiply the second of the last pair of equations by ż¿ and then sub- 
tract it from the first, we obtain 


Ux — iUy = (a — i8)(x — iy). 


This means that Utz = Xz, where the suggestive and convenient symbol 
Z denotes, of course, the vector z — ty. Since the argument (that is, the 
passage from Utz = dz to Utz = 42) is reversible, we have proved that 
the mapping z — 2 is a one-to-one correspondence between M and the 
subspace M consisting of all solutions 2 of Utz = XZ. The result implies, 
among other things, that the complex proper values of UF come in pairs; 
if à is one of them, then so is À. (This remark alone we could have ob- 
tained more quickly from the fact that the coefficients of the characteristic 
polynomial of Ut are real.) 

We have not yet made use of the unitary character of Ut. One way 
we can make use of it is this. If àis a complex (definitely not real) proper 
value of UF, then \ = 4; it follows that if Utz = dz, so that Utz = 3, 
then z and Z are orthogonal. This means that 


0 = w+ iy, z — iy) = || z |? — Ily I? + i, y) + G, 2), 


and hence that || z ||? = || y ||? and (z, y) = —(y, z). Since a real inner 
product is symmetric ((z, y) = (y, x)), it follows that (x, y) = 0. This, 
in turn, implies that || z ||? = || z ||? + || y ||? and hence that || z || = || y || 


1 
val z |l. 

If Mı and Ag are proper values of Ut with M * àz and M ¥ Je, and 
if zı = zı + ty, and z2 = z3 + ty2 are corresponding proper vectors 
(z1, Z2, Y1, Y2 in V), then z; and zz are orthogonal and (since Zz is a proper 
vector belonging to the proper value ña) zı and 22 are also orthogonal. 
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Using again the expression for the complex inner product on UF in terms 
of the real inner product on U, we see that 


(1, z2) + (Yi, Y2) = (21, Y2) — (Yı, z2) = 0 
and 
(x1, £2) Ty (y1, Y2) = (x1, Y2) + (y1, z2) = 0. 


It follows that the four vectors z1, £2, Y1, and ye are pairwise orthogonal. 

The unitary transformation U* could have real proper values too. Since, 
however, we know that the proper values of UY have absolute value one, 
it follows that the only possible real proper values of Ut are +1 and —1. 
If Ut(@ + iy) = +(x + iy), then Ur = +z and Uy = y, so that the 
proper vectors of Ut with real proper values are obtained by putting to- 
gether the proper vectors of U in an obvious manner. 

We are now ready to take the final step. Given U, choose an orthonormal 
basis, say %1, in the linear manifold of solutions of Ux = x (in V), and, 
similarly, choose an orthonormal basis, say X_ , in the linear manifold of 
solutions of Ux = —z (in U). (The sets Xı and X_; may be empty.) 
Next, for each conjugate pair of complex proper values \ and X of Ut, 
choose an orthonormal basis {z1, ---, zr} in the linear manifold of solutions 
of Utz = dz (in Ut). If 2; = 2; + iy; (with z; and y; in V), let Xx be the 
set {VZ t, VŽ yn +*+, VŽ £r V2 yr} of vectors in U. The results we 
have obtained imply that if we form the union of all the sets X1, X—ı, and 
Xy, for all proper values à of UY, we obtain an orthonormal basis of U. 
In case X, has three elements, X_, has four elements, and there are two 
conjugate pairs {\1, 4;} and {Xo, Ae}, then the matrix of U with respect to 
the basis so constructed looks like this: 
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(All terms not explicitly indicated are equal to zero.) In general, there is 
a string of +1’s on the main diagonal, followed by a string of —1’s, and 
then there is a string of two-by-two boxes running down the diagonal, 
each box having the form 6 =i ) , with a? + 8? = 1. The fact that 
a” + 8? = 1 implies that we can find a real number 8 such that a = cos 0 
and 8 = sin 9; it is customary to use this trigonometric representation in 
writing the canonical form of the matrix of an orthogonal transformation. 


EXERCISES 


1. Every proper value of an orthogonal transformation has absolute value 1. 


2.If A = ¢ o) , how many (real) orthogonal matrices P are there with the 


property that P—!AP is diagonal? 


3. State and prove a sensible analogue of the spectral theorem for normal trans- 
formations on & real inner product space. 


§ 82. Functions of transformations 


One of the most useful concepts in the theory of normal transformations 
on unitary spaces is that of a function of a transformation. If A is a 
normal transformation with spectral form J; a;E; (for this discussion we 
temporarily assume that the underlying vector space is a unitary space), 
and if f is an arbitrary complex-valued function defined at least at the 
points a;, then we define a linear transformation f(A) by 

f(A) = Li flasE; 

Since for polynomials p (and even for rational functions) we have already 
seen that our earlier definition of p(A) yields, if A is normal, p(A) = 

>; p(a,)E;, we see that the new notion is a generalization of the old one. 
The advantage of considering f(A) for arbitrary functions f is for us largely 
notational; it introduces nothing conceptually new. Indeed, for an 
arbitrary f, we may write f(a;) = 6;, and then we may find a polynomial 
p that at the finite set of distinct complex numbers a; takes, respectively, 
the values 6;. With this polynomial p we have f(A) = p(A), so that the 
class of transformations defined by the formation of arbitrary functions 
is nothing essentially new; it only saves the trouble of constructing a 
polynomial to fit each special case. Thus for example, if, for each complex 
number A, we write 

fi(Q) =O whenever t A 


AQ) =1, 


and 
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then f,(A) is the perpendicular projection on the subspace of solutions 
of Ax = dz. 1 
We observe that if f(t) = p then (assuming of course that f is defined 


for all a;, that is, that a; = 0) f(A) = Aq, and if f(¢) = Ẹ, then f(A) = A*. 
These statements imply that if f is an arbitrary rational function of ¢ and f, 


1 
we obtain f(A) by the replacements ¢ — A, § — A*, and = Av}, 


The symbol f(A) is, however, defined for much more general functions, 
and in the sequel we shall feel free to make use of expressions such as e4 
and VA. 

A particularly important function is the square root of positive trans- 
formations. We consider f(t) = +/¢, defined for all real ¢ = 0, as the 
positive square root of ¢, and for every positive A = }_;a,H; we write 


VA = Dj Va; By. 


(Recall that a; 2 0 for all j. The discussion that follows applies to both 


real and complex inner product spaces.) It is clear that VA = 0 and 
that (VA )? = A; we should like to investigate the extent to which these 
properties characterize VA. At first glance it may seem hopeless to look 
for any uniqueness, since if we consider B = >); + V a; Ej, with an 
arbitrary choice of sign in each place, we still have A = B’. The trans- 
formation V'A that we constructed, however, was positive, and we can 
show that this additional property guarantees uniqueness. In other 
words: if A = B? and B 2 0, then B = VA. To prove this, let B = 
>: BF, be the spectral form of B; then 


Di bF: = B? = A = Do; oE; 


Since the $, are distinct and positive, so also are the 8.2; the uniqueness 
of the spectral form of A implies that each 6,” is equal to some a; (and 
vice versa), and that the corresponding E’s and F’s are equal. By a 
permutation of the indices we may therefore achieve 8,” = aj for all j, 
so that 8; = Vai, as was to be shown. 

There are several important applications of the existence of square 
roots for positive operators; we shall now give two of them. 

First: we recall that in § 72 we mentioned three possible definitions of 
a positive transformation A, and adopted the weakest one, namely, that 
A is self-adjoint and (Az, x) 2 0 for all z. The strongest of the three 
possible definitions was that we could write A in the form A = B? for 
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some self-adjoint B. We point out that the result of this section con- 
cerning square roots implies that the (seemingly) weakest one of our 
conditions implies and is therefore equivalent to the strongest. (In fact, 
we can even obtain a unique positive square root.) 

Second: in § 72 we stated also that if A and B are positive and commuta- 
tive, then AB is also positive; we can now give an easy proof of this asser- 
tion. Since VA and VB are functions of (polynomials in) A and B 
respectively, the commutativity of A and B implies that VA and VB 
commute with each other; consequently 


AB = VAVAVBVB = VA VBVAVB = (VA VB). 


Since VA and VB are self-adjoint and commutative, their product is 
self-adjoint and therefore its square is positive. 

Spectral theory also makes it quite easy to characterize the matrix (with 
respect to an arbitrary orthonormal coordinate system) of a positive 
transformation A. Since det A is the product of the proper values of A, 
it is clear that A = 0 implies det A 2 0. (The discussion in § 55 applies 
directly to complex inner product spaces only; the appropriate modification 
needed for the discussion of self-adjoint transformations on possibly real 
spaces is, however, quite easy to supply.) If we consider the defining 
property of positiveness expressed in terms of the matrix (a,;) of A, that 
is, >i ).j ousf£; 2 0, we observe that the last expression remains positive 
if we restrict the coordinates (£, ++, £n) by requiring that certain ones 
of them vanish. In terms of the matrix this means that if we cross out 
the columns numbered j1, +-+, jz, say, and cross out also the rows bearing 
the same numbers, the remaining small matrix is stil] positive, and conse- 
quently so is its determinant. This fact is usually expressed by saying that 
the principal minors of the determinant of a positive matrix are positive. 
The converse is true. The coefficient of the j-th power of A in the charac- 
teristic polynomial det (A — X) of A is (except for sign) the sum of all 
principal minors of n-j rows and columns. The sign is alternately plus 
and minus; this implies that if A has positive principal minors and is 
self-adjoint (so that the zeros of det (A — A) are known to be real), then 
the proper values of A are positive. Since the self-adjoint character of a 
matrix is ascertainable by observing whether or not it is (Hermitian) sym- 
metric (ai; = &;;), our comments reduce the problem of finding out whether 
or not a matrix is positive to a finite number of elementary computations. 
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EXERCISES 


1. Corresponding to every unitary transformation U there is a Hermitian 
transformation A such that U = e*4, 


2. Discuss the theory of functions of a normal transformation on a real inner 
product space. 


3. If A < B and if C is a positive transformation that commutes with both A 
and B, then AC < BC. 


4, A self-adjoint transformation has a unique self-adjoint cube root. 


5. Find all Hermitian cube roots of the matrix 


1 090 
( —1 o); 
0 08 


6. (a) Give an example of a linear transformation A on a finite-dimensional 
unitary space such that A has no square root. 

(b) Prove that every Hermitian transformation on a finite-dimensional unitary 
space has a square root. 

(c) Does every self-adjoint transformation on a finite-dimensional Euclidean 
space have a square root? 

7. (a) Prove that if A is a positive linear transformation on a finite-dimensional 


inner product space, then p(V A ) = p(A). 
(b) If A is a linear transformation on a finite-dimensional inner product space, 
is it true that p(A*A) = p(A)? 


8. If A = 0 and if (Az, z) = 0 for some z, then Az = 0. 
9. If A = 0, then {(Az, y)|* < (Az, z)(Ay, y) for all z and y. 


10. If the vectors zı, +++, 2% are linearly independent, then their Gramian is 
non-singular. 


11. Every positive matrix is a Gramian. 


12. If A and B are linear transformations on a finite-dimensional inner product 
space, and if 0 S A £ B, then det A < det B. (Hint: the conclusion is trivial if 


det B = 0; if det B ¥ 0, then VB is invertible.) 


13. If a linear transformation A on a finite-dimensional inner product space is 
strictly positive and if A S$ B, then B-! < A—!. (Hint: try A = 1 first.) 


14. (a) If B is a Hermitian transformation on a finite-dimensional unitary space, 


then 1 + ¿B is invertible. 
(b) If A is positive and invertible and if B is Hermitian, then A + sB is invertible. 


15. If OS ASB, then VAS VB. (Hint: compute 
(VB+ VA+eQ(VB-— VA+0, 
and prove thereby that the second factor is invertible whenever e > 0.) 


Sec. 83 POLAR DECOMPOSITION 169 


16. Suppose that A is a self-adjoint transformation on a finite-dimensional 
inner product space; write |A] = V A?, 44 = 4(|A| + 4), and A_ = 3({A] 
A) 


(a) Prove that |A| is the smallest Hermitian transformation that commutes 
with A and for which both A S |A| and —A <£ |A|. (“Smallest” refers, of 
course, to the ordering of Hermitian transformations.) 

(b) Prove that A is the smallest positive transformation that commutes with 
A and for which A S Ax. 

(c) Prove that A_ is the smallest positive transformation that commutes with 
A and for which ~A S A. 

(d) Prove that if A and B are self-adjoint and commutative, then there exists a 
smallest self-adjoint transformation C that commutes with both A and B and for 
which both A < Cand BSC. 


17. (a) If A and B are positive linear transformations on a finite-dimensional 
unitary space, and if A? and B? are unitarily equivalent, then A and B are unitarily 
equivalent. 

(b) Is the real analogue of (a) true? 


§ 83. Polar decomposition 


There is another useful consequence of the theory of square roots, namely, 
the analogue of the polar representation ¢ = pe” of a complex number. 


THEorEM 1. If A is an arbitrary linear transformation on a finite-di- 
mensional inner product space, then there is a (uniquely determined) 
positive transformation P, and there is an isometry U, such that A = UP. 
If A is invertible, then U also is uniquely determined by A. 


PROOF. Although it is not logically necessary to do so, we shall first 
give the proof in case A is invertible; the general proof is an obvious 
modification of this special one, and the special proof gives greater in- 
sight into the geometric structure of the transformation A. 

Since the transformation A*A is positive, we may find its (unique) 
positive square root, P = V A*A. We write V = PA7!; since VA = P, 
the theorem will be proved if we can prove that V is an isometry, for then 
we may write U = V—!. Since 


V* = (A) *P* = (44) IP, 
we see that 
V*V = (A*)"'PPA™ = (A*) 1A*AA7! = 1, 


so that V is an isometry, and we are done. 
To prove uniqueness we observe that UP = UoPo implies PU* = PgU,* 
and therefore 
P? = PU*UP = PoU0*UoPo = Po?. 
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Since the positive transformation P? = Po? has only one positive square 
root, it follows that P = Po. (In this part of the proof we did not use the 
invertibility of A.) If A is invertible, then so is P (since P = U~'A), 
and from this we obtain (multiplying the relation UP = UoPo on the 
right by P7! = Pg~') that U = Uo. 

We turn now to the general case, where we do not assume that A is 
invertible. We form P exactly the same way as above, so that P? = A*A, 
and then we observe that for every vector z we have 


|| Px ||? = (Pz, Pz) = (P*x, x) = (A*Az, x) = || Az |]? 


If for each vector y = Px in the range R(P) of P we write Uy = Az, then 
the transformation U is length-preserving wherever it is defined. We 
must show that U is unambiguously determined, that is, that Pr, = Px 
implies Azı = Azə. This is true since P(x, — z2) = 0 is equivalent to 
|| P(zı — 22) || = 0 and this latter condition implies || A (zı — z2) || = 0. 
The range of the transformation U, defined so far on the subspace ®(P) 
only, is R(A). Since U is linear, R(A) and R(P) have the same dimension, 
and therefore (A(A))*+ and (R(P))* have the same dimension. If we define 
U on (@(P))* to be any linear and isometric transformation of (R(P))* 
onto (@(A))+, then U, thereby determined on all U, is an isometry with 
the property that UPx = Az for all x. This completes the proof. 

Applying the theorem just proved to A* in place of A, and then taking 
adjoints, we obtain also the dual fact that every A may be written in the 
form A = PU with an isometric U and a positive P. In contrast with the 
Cartesian decomposition (§ 70), we call the representation A = UP a 
polar decomposition of A. 

In terms of polar decompositions we obtain a new characterization of 
normality. 


THEOREM 2. If A = UP isa polar decomposition of the linear transforma- 
tion A, then a necessary and sufficient condition that A be normal is that 
PU = UP. 


PROOF. Since U is not necessarily uniquely determined by A, the state- 
ment is to be interpreted as follows: if A is normal, then P commutes with 
every U, and if P commutes with some U, then A is normal. Since AA* 
= UP?U* = UP?U— and A*A = P”, it is clear that A is normal if and 
only if U commutes with P?. Since, however, P? is a function of P and 
vice versa P is a function of P? (P = VP? ), it follows that commuting 
with P? is equivalent to commuting with P. 
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EXERCISES 


1. If a linear transformation on a finite-dimensional inner product space has 
only one polar decomposition, then it is invertible. 


2. Use the functional calculus to derive the polar decomposition of a normal 
operator. 


3. (a) If A is an arbitrary linear transformation on a finite-dimensional inner 
product space, then there is a partial isometry U, and there is a positive transforma- 
tion P, such that N(U) = N(P) and such that A = UP. The transformations 
U and P are uniquely determined by these conditions. 

(b) The transformation A is normal if and only if the transformations U and P 
described in (a) commute with each other. 


§ 84. Commutativity 


The spectral theorem for self-adjoint and for normal operators and the 
functional calculus may also be used to solve certain problems concerning 
commutativity. This is a deep and extensive subject; more to illustrate 
some methods than for the actual results we discuss two theorems from it. 


THEOREM 1. Two self-adjoint transformations A and B on a finite-di- 
mensional inner product space are commutative if and only if there exists 
a self-adjoint transformation C and there exist two real-valued functions 
J and g of a real variable so that A = f(C) and B = g(C). If sucha C 
exists, then we may even choose C in the form C = h(A, B), where h is a 
suitable real-valued function of two real variables. 


PROOF. The sufficiency of the condition is clear; we prove only the 
necessity. 

Let A = >); a,E; and B = ));6;,F; be the spectral forms of A and B; 
since A and B commute, it follows from § 79, Theorem 3, that E; and F} 
commute. Let h be any function of two real variables such that the num- 
bers h(a;, 8;) = yi; are all distinct, and write 


C = h(4, B) = Ds Dj hla; 8) E:F}. 


(It is clear that h may even be chosen as a polynomial, and the same is 
true of the functions f and g we are about to describe.) Let f and g 
be such that f(7i;) = a; and g(yi;) = 8; for all ¢ and j. It follows that 
(C) = A and g(C) = B, and everything is proved. 


THEOREM 2. If A is a normal transformation on a finite-dimensional 
unttary space and if B is an arbitrary transformation that commutes with 
A, then B commutes with A*. 
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PROOF. Let A = >); 0;H; be the spectral form of A; then A* = 
>; &:E;. Let f be such a function (polynomial) of a complex variable 
that f(æ;) = a; for all i. Since A* = f(A), the conclusion follows. 


EXERCISES _ 


1, (a) Prove the following generalization of Theorem 2: if A; and A are normal 
transformations (on a finite-dimensional unitary space) and if A1B = BA, then 
Ay*B = BA3*. 

(b) Theorem 2 asserts that the relation of commutativity is sometimes transitive: 
if A* commutes with A and if A commutes with B, then A* commutes with B. 
Does this formulation remain true if A* is replaced by an arbitrary transformation 
Cc? 


2. (a) If A commutes with A*A, does it follow that A is normal? 
(b) If A*A commutes with AA*, does it follow that A is normal? 


3. (a) A linear transformation A is normal if and only if there exists a poly- 
nomial p such that A* = p(A). 

(b) If A is normal and commutes with B, then A commutes with B*. 

(c) If A and B are normal and commutative, then AB is normal. 


4. If A and B are normal and similar, then they are unitarily equivalent. 


5. (a) If A is Hermitian, if every proper value of A has multiplicity 1, and if 
AB = BA, then there exists a polynomial p such that B = p(A). 

(b) If A is Hermitian, then a necessary and sufficient condition that there exist 
a polynomial p such that B = p(A) is that B commute with every linear transforma- 
tion that comiaoutes with A. 


6. Show that a commutative set of normal transformations on a finite-dimensional 
unitary space can be simultaneously diagonalized. 


§ 85. Self-adjoint transformations of rank one 


We have already seen (§ 51, Theorem 2) that every linear transformation 
A of rank p is the sum of p linear transformations of rank one. It is easy to 
see (using the spectral theorem) that if A is self-adjoint, or positive, then 
the summands may also be taken self-adjoint, or positive, respectively. 
We know (§ 51, Theorem 1) what the matrix of transformation of rank one 
has to be; what more can we say if the transformation is self-adjoint or 
positive? 


TurorEM 1. If A has rank one and is self-adjoint (or positive), then in 
every orthonormal coordinate system the matrix (a;;) of A is given by aij 
= xb; with a real « (or by ay = ys¥;). If, conversely, [A] has this form 
in some orthonormal coordinate system, then A has rank one and is self- 
adjoint (or positive). 
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PROOF. We know that the matrix (a,;) of a transformation A of rank 
one, in any orthonormal coordinate system X = {z1, ---, Zn}, is given by 
aij = Biy; If A is self-adjoint, we must also have aj = @;;, whence 
Biv; = Biy: If B; = 0 and y: # 0 for some 7, then 8; = 8,7;/7; = 0 for 
allj, whence A = 0. Since we assumed that the rank of A is one (and not 
zero), this is impossible. Similarly 6; = 0 and y; = 0 is impossible; that 
is, we can find an 7 for which Byy; Æ 0. Using this 7, we have 


B; = (B:/7)Y; = KY; 


with some non-zero constant x, independent of j. Since the diagonal 
elements aj; = (Ax;, x) = By; of a self-adjoint matrix are real, we can 
even conclude that a,;; = «6,8; with a real x. 

If, moreover, A is positive, then we even know that «6,8; = aj; = 
(Az;, z;) is positive, and therefore so is x. In this case we write \ = Vk; 
the relation «8,8; = (M8:)AB;) shows that a,; is given by aij = yif; 

It is easy to see that these necessary conditions are also sufficient. 
If aj; = «8,8; with a real x, then A is self-adjoint. If a; = y:7;, and z 
= >): fit, then 


(Az, t) = Os Dy aiit = DOs Dos vB; 
= (os iki) (3.5 vE) v) = | È; vžil? 20 
so that A is positive. 


As a consequence of Theorem 1 it is very easy to prove a remarkable 
theorem on positive matrices. 


THEOREM 2. If A and B are positive linear transformations whose matrices 
in some orthonormal coordinate system are (a;;) and (Bij) respectively, then 
the linear transformation C, whose matrix (yj) in the same coordinate 
system is given by Yij = a:j8:3 for all i and j, is also positive. 


PROOF. Since we may write both A and B as sums of positive transforma- 
tions of rank one, so that 


= ys of ap 
= De p353, 
ij = Due pat on #(alp%). 


(The superscripts here are not exponents.) Since a sum of positive ma- 
trices is positive, it will be sufficient to prove that, for each fixed p and 


q, the matrix ((a?6%)(e?67)) is positive, and this follows from Theorem 1. 


and 


it follows that 
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The proof shows, by the way, that Theorem 2 remains valid if we re- 
place “positive” by “self-adjoint” in both hypothesis and conclusion; in 
most applications, however, it is only the actually stated version that is 
useful. The matrix (y;;) described in Theorem 2 is called the Hadamard 
product of (a;;) and (6:3). 


EXERCISES 


1. Suppose that U and Ù are finite-dimensional inner product spaces (both real 
or both complex). 

(a) There is a unique inner product on the vector space of all bilinear forms on 
u YV such that if wi(x, y) = (z, zı)(y, ys) and wax, y) = (z, 22)(y, ya), then 
(wi, 02) = (22, 41)(Y2, Y1). 

(b) There is a unique inner product on the tensor product U ® V such that if 
21 = T; @ yı and z2 = 2 @ ys, then (zı, z2) = (T1, %2)(y1, Y2). 

(c) If {z:} and {yp} are orthonormal bases in U and U, respectively, then the 
vectors x; ® yp form an orthonormal basis in U ® V. 


2. Is the tensor product of two Hermitian transformations necessarily Hermitian? 
What about unitary transformations? What about normal transformations? 


CHAPTER IV 


ANALYSIS 


§ 86. Convergence of vectors 


Essentially the only way in which we exploited, so far, the existence of 
an inner product in an inner product space was to introduce the notion of 
a normal transformation together with certain important special cases 
of it. A much more obvious circle of ideas is the study of the convergence 
problems that arise in an inner product space. 

Let us see what we might mean by the assertion that a sequence (£a) 
of vectors in U converges to a vector x in U. There are two possibilities 
that suggest themselves: 


G) | 2n — z || > Oasn > o; 
(ii) (fn — 2, y) — O asn — œ, for each fixed y in U. 
Tf (i) is true, then we have, for every y, 

|@n = 2, y)| S l| 2a- £ I|- ly li — o0, 


so that (ii) is true. In a finite-dimensional space the converse implication 
is valid: (ii) = (i). To prove this, let {z,, ---, zy} be an orthonormal 
basis in U. (Often in this chapter we shall write N for the dimension of a 
finite-dimensional vector space, in order to reserve n for the dummy 
variable in limiting processes.) If we assume (ii), then (r, — z, z;) > 0 
for each i = 1, ---, N. Since (§ 63, Theorem 2) 


lza = z ||? = Dis | (@n — z, 2) P, 


it follows that || zn — x || — 0, as was to be proved. 

Concerning the convergence of vectors (in either of the two equivalent 
senses) we shall use without proof the following facts. (All these facts are 
easy consequences of our definitions and of the properties of convergence 


in the usual domain of complex numbers; we assume that the reader has 
175 
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a modicum of familiarity with these notions.) The expression ax + By 
defines a continuous function of all its arguments simultaneously; that is, 
if (æn) and (8,) are sequences of numbers and (x,) and (yn) are sequences 
of vectors, then a, — a, By > b, tn — z, and yn — y imply that 
Ann + Bryn — ax + By. If {z;} is an orthonormal basis in ù, and if 
In = Doi Qinti and z = Ži aizi, then a necessary and sufficient condition 
that zn — z is that aj, — a; (as n — œ) foreach? = 1,---,N. (Thus 
the notion of convergence here defined coincides with the usual one in 
N-dimensional real or complex coordinate space.) Finally, we shall assume 
as known the fact that a finite-dimensional inner product space with the 
metric defined by the norm is complete; that is, if (£„) is a sequence of 
vectors for which || ta — 2m || — 0, as n,m — œ, then there is a (unique) 
vector z such that x, — z asn — œ. 


§ 87. Norm 


The metric properties of vectors have certain important implications 
for the metric properties of linear transformations, which we now begin 
to study. 


Derinition. A linear transformation A on an inner product space U 
is bounded if there exists a constant K such that || Az || < K || 2 || for 
every vector xin U. The greatest lower bound of all constants K with 
this property is called the norm (or bound) of A and is denoted by || A ||. 


Clearly if A is bounded, then || Az || £ || 4 ||-|| x || forall z. For examples 
we may consider the cases where A is a (non-zero) perpendicular projection 
or an isometry; § 75, Theorem 1, and the theorem of § 73, respectively, 
imply that in both cases || A || = 1. Considerations of the vectors defined 
by a(t) = t in E shows that the differentiation transformation is not 
bounded. 

Because in the sequel we shall have occasion to ccnsider quite a few 
upper and lower bounds similar to || A ||, we introduce a convenient nota- 
tion. If P is any possible property of real numbers ¢, we shall denote the 
set of all real numbers t possessing the property P by the symbol {¢: P}, 
and we shall denote greatest lower bound and least upper bound by inf 
(for infimum) and sup (for supremum) respectively. In this notation 
we have, for example, 


|| A || = inf {K:]] Az || < KII z || for all z}. 


The notion of boundedness is closely connected with the notion of 
continuity. If A is bounded and if e is any positive number, by writing 
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ô = Îl a Îi —— we may make sure that || z — y || < 6 implies that 


l Ae — Ay || = [4e -y | siale- yl <6 
in other words boundedness implies (uniform) continuity. (In this proof 


we tacitly assumed that || A || = 0; the other case is trivial.) In view of 
this fact the following result is a welcome one. 


TueoreM. Every linear transformation on a finite-dimensional inner 
product space is bounded. 


PROOF. Suppose that A is a linear transformation on U; let {21, ---, £y} 
be an orthonormal basis in Y and write 


Ko = max {l Ax, ll, very I Ary Il}. 


Since an arbitrary vector x may be written in the form z = J; (2, x;)ai, 
we obtain, applying the Schwarz inequality and remembering that || z; || 


I Az |] = |] ACE: G, zdz) | 
= || Ds @, r)Az: || S Dos | (e, sl- |] Az; ll 
SDillell- ies: Axel S Ko Di lle 
= NK || z ||. 


In other words, K = NKo is a bound of A, and the proof is complete. 

It is no accident that the dimension N of U enters into our evaluation; 
we have already seen that the theorem is not true in infinite-dimensional 
spaces. 


EXERCISES 


1. (a) Prove that the inner product is a continuous function (and therefore so 
also is the norm); that is, if sn —> z and yn — y, then (tn, Ya) — (z, y). 
(b) Is every linear functional continuous? How about multilinear forms? 


2. A linear transformation A on an inner product space is said to be bounded 
from below if there exists a (strictly) positive constant K such that || Az || = K|{ z || 
for every z. Prove that (on a finite-dimensional space) A is bounded from below 
if and only if it is invertible. 


3. If a linear transformation on an inner product space (not necessarily finite- 
dimensional) is continuous at one point, then it is bounded (and consequently con- 
tinuous over the whole space). 


4. For each positive integer n construct a projection Z, (not a perpendicular 
projection) such that || En || = n. 


5. (a) If U is a partial isometry other than 0, then || U |} = 1. 
(b) If U is an isometry, then || UA |! = || AU || = || A || for every linear trans- 
formation A. 


178 ANALYSIS Seo. 88 


6. If E and F are perpendicular projections, with ranges WM and N respectively, 
and if || E — F || < 1, then dim M = dim N. 


7. (a) If A is normal, then || A” || = || A |i" for every positive integer n. 

(b) If A is a linear transformation on a 2-dimensional unitary space and if 
i| A? || = || A ||?, then A is normal. 

(c) Is the conclusion of (b) true for transformations on a 3-dimensional space? 


§ 88. Expressions for the norm 


To facilitate working with the norm of a transformation, we consider the 
following four expressions: 


p = sup {|| Az |I/|] z |]: z = 0}, 

q = sup {|| Az ||: [fo |] = 1}, 

sup {|(Az, y)|/Il æ Il- lly ll: 2 4 0, y = 0}, 
sup {| (Az, y)|: izl = lly |] = 1}. 


In accordance with our definition of the brace notation, the expression 
{|| Az ||: || æ || = 1}, for example, means the set of all real numbers of the 
form || Az ||, considered for all x’s for which || x || = 1. 

Since || Az || £ K|| z | is trivially true with any K if z = 0, the definition 
of supremum implies that p = || A ||; we shall prove that, in fact, p = q 
=r = s = || A ||. Since the supremum in the expression for g is extended 
over a subset of the corresponding set for p (that is, if || x || = 1, then 
|| Az |I/Il x || = || Az ||), we see that q = p; a similar argument shows that 
ssr. 


r 


s 


z 
ll x || 
{| Az |z || = | Ay ||. In other words, every number of the set whose 
supremum is p occurs also in the corresponding set for g; it follows that 
p < q, and consequently that p = q = || A |l. 

Similarly if z ~ 0 and y ¥ 0, we consider 2’ = x/|| x || andy’ = y/|l y |; 


we have 
(Az, y)I/ll all - lly ll = Ar, yf, 


and hence, by the argument just used, r < s, so that r = s. 
To consolidate our position, we note that so far we have proved that 


For any x ~0 we consider y = (so that || y || = 1); we have 


p=q=] A|| and rss. 


Since 
Iz yl _ Arig I Az 


Pete lal 7 let- tyi ilz |l 
it follows that r < p; we shall complete the proof by showing that p < r. 
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For this purpose we consider any vector x for which Az + 0 (so that 
x 0); for such an x we write y = Ax and we have 


|| Az {l/l zll = Az, w/e y Il 


In other words, we proved that every number that occurs in the set defining 
p, and is different from zero, occurs alsoin the set of whichris the supremum; 
this clearly implies the desired result. 

The numerical function of a transformation A given by || A || satisfies 
the following four conditions: 


(1) (A+ BI SIAl+I BI, 
(2) ABI siaN DBI, 
(3) | oA || = al- 4 I, 
(4) | A* i = [4 I. 


The proof of the first three of these is immediate from the definition of the 
norm of a transformation; for the proof of (4) we use the equation || A || 
= r, as follows. Since 


[(Az, y)| = 1@,4*y)| Siz - 1 A*y ll 
s A* elel- yN, 


we see that || A || < || 4* ||; replacing A by A* and A* by A** = A, we 
obtain the reverse inequality. 


EXERCISES 


1. If B is invertible, then || AB || = {| 4 II/I B7: || for every A. 
2. Is it true for every linear transformation A that || A*A || = || AA* i? 


3. (a) If A is Hermitian and if a = 0, then a necessary and sufficient condition 
that || A || S æ is that —a < A Sa. 

(b) If A is Hermitian, if a $ A £ £, and if p is a polynomial such that p(t) 
2 0 whenever a £ t S 8, then p(A) 2 0. 

(c) If A is Hermitian, if a < A < B, and if p is a polynomial such that p(t) 
# 0 whenever a S t < @, then p(A) is invertible. 


§ 89. Bounds of a self-adjoint transformation 


As usual we can say a little more about the special case of self-adjoint 
transformations than in the general case. We consider, for any self-adjoint 
transformation A, the sets of real numbers 


è = {(Az, z)/]] z |]: z = 0} 
¥ = {(Az, z): || z |] = 1}. 


and 
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It is clear that ¥ C ®. If, for every z Æ 0, we write y = z/|| x ||, then 
lly || = 1 and (Az, x)/|| z ||? = (Ay, y), so that every number in & occurs 
also in Y and consequently = y. We write 


a = inf = inf Y, 
8 = sup ® = sup F, 


and we say that a is the lower bound and $ is the upper bound of the self- 
adjoint transformation A. If we recall the definition of a positive trans- 
formation, we see that a is the greatest real number for which A — a 
= 0 and £ is the least real number for which 8 — A 20. Concerning 
these numbers we assert that 


y = max {]a], [8|} = | All. 
Half the proof is easy. Since 
|(Az,z)| < | Az- lell s iA el, 


it is clear that both |a| and |8] are dominated by || A ||. To prove the 
reverse inequality, we observe that the positive character of the two 
linear transformations y — A and y + A implies that both 


(y + A)*(y — A) + A) = (y + A) — A) + A) 
and 
(y — A)*(y + A)(y — A) = (y — Aly + A)y — A) 


are positive, and, therefore, so also is their sum 2y(y? — A”). Since 
= 0 implies || A || = 0, the assertion is trivial in this case; in any other 
case we may divide by 2 and obtain the result that y? — A? 20. In 
other words, 

șI = |? = v7(@, 2) 2 (A, 2) = || Az |È, 


whence y 2 || A ||, and the proof is complete. 

We call the reader’s attention to the fact that the computation in the 
main body of this proof could have been avoided entirely. Since both 
y — A and y + A are positive, and since they commute, we may conclude 
immediately (§ 82) that their product y? — A? is positive. We presented 
the roundabout method in accordance with the principle that, with an 
eye to the generalizations of the theory, one should avoid using the spectral 
theorem whenever possible. Our proof of the fact that the positiveness 
and commutativity of A and B imply the positiveness of AB was based 
on the existence of square roots for positive transformations. This fact, 
to be sure, can be obtained by so-called “elementary” methods, that is, 
methods not using the spectral theorem, but even the simplest elementary 
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proof involves complications that are purely technical and, for our pur- 
poses, not particularly useful. 


§ 90. Minimax principle 


A very elegant and useful fact concerning self-adjoint transformations 
is the following minimax principle. 


Tsrorem. Let A be a self-adjoint transformation on an n-dimensional 
inner product space U, and let M, «++, An be the (not necessarily distinct) 
proper values of A, with the notation so chosen that M Z MÈ +++ = An 
Tf, for each subspace MW of Y, 


(OM) = sup {(Az, z): z in M, |] x || = 1}, 
and if, fork = 1, -+-,n, 

we = inf {u(9): dim M =n —k + 1}, 
then py = Xx for k = 1, e,n. 


PROOF. Let {21, -+-, £n} be an orthonormal basis in U for which Ax; 
= Airs, îi = 1, ---, n (§ 79); let Mz be the subspace spanned by 2, «++, £k, 
fork = 1,---,n. Since the dimension of 91%; is k, the subspace IN, cannot 
be disjoint from any (n — k + 1)-dimensional subspace m in V; if M is 
any such subspace, we may find a vector x belonging to both M and M 
and such that || z || = 1. For this z = J f=: &; we have 


(Az, £) = Soh AcE |? Se Deter lë: 


= All T I? = Nk, 
so that (W) Z Ax. 
If, on the other hand, we consider the particular (n — k + 1)-dimensional 
subspace Mo spanned by Tk, 2441, ***, Zn, then, for each z = > 7, Et; 
in this subspace, we have (assuming || z || = 1) 


(Az, £) = Ppor AlE|? S de Doma Esl? 


= ral z |? = As, 
so that (Mo) SS Ax- 

In other words, as M runs over all (n — k + 1)-dimensional subspaces, 
(WM) is always = Ax, and is at least once < Ax; this shows that uz = Az, 
as was to be proved. 

In particular for k = 1 we see (using § 89) that if A is self-adjoint, then 
|| A || is equal to the maximum of the absolute values of the proper values 
of A 
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EXERCISES 


1. If) is a proper value of a linear transformation A on a finite-dimensional inner 
product space, then [àA] < || 4 |l. 


2. If A and B are linear transformations on a finite-dimensional unitary space, 
and if C = AB — BA, then || 1 — C || 2 1. (Hint: consider the proper values of 
C.) 


3. If A and B are linear transformations on a finite-dimensional unitary space, 
if C = AB — BA, and if C commutes with A, then C is not invertible. (Hint: 
if C is invertible, then 2|| B ||-|| A -JAHI = k || 4171/11 Cl) 


4. (a) If A is a normal linear transformation on a finite-dimensional unitary 
space, then || A || is equal to the maximum of the absolute values of the proper 
values of A. 

(b) Does the conclusion of (a) remain true if the hypothesis of normality is 
omitted? 


5. The spectral radius of a linear transformation A on a finite-dimensional 
unitary space, denoted by r(A), is the maximum of the absolute values of the proper 
values of A. 

(a) If fA) = (i — AA)~'2, y), then f is an analytic function of À in the region 


determined by |A|< z (for each fixed z and y). 


r(A) 
(b) There exists a constant K such that |A|" || A” || S K whenever |A] < iH 
and n = 0, 1, 2, ---. (Hint: for each z and y there exists a constant K such that 


[A*(A*2, y)| S K for all n.) 

(e) lim sup, || A” ||!" £ r(A). 

(d) (r(A))* < r(A”), n = 0, 1, 2, +- 

(e) r(A) = lima || A” |1". 

6. If A is a linear transformation on a finite-dimensional unitary space, then 
a necessary and sufficient condition that r(A) = || A || is that || A” || = || A |j” 
for n = 0,1, 2, ---. 

7. (a) If A is a positive linear transformation on a finite-dimensional inner 
product space, and if AB is self-adjoint, then 


|(ABz, x)| & || B ll-(Az, z) 
for every vector z. 
(b) Does the conclusion of (a) remain true if || B || is replaced by r(B)? 


§ 91. Convergence of linear transformations 


We return now to the consideration of convergence problems. There 
are three obvious senses in which we may try to define the convergence of a 
sequence (A,) of linear transformations to a fixed linear transformation A. 


(i) lAs — All > Oasn — o. 
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(ti) || Anz — Az || —> O asn — ~ for each fixed z. 
(iii) | (Anz, y) — (Az, y)| — 0asn — œ for each fixed x and y. 
If (i) is true, then, for every z, 
| Anz — Az || = || (4n — Ae || S | 4a — A Il- fel] > 0, 


so that (i) = (ii). We have already seen (§ 86) that (ii) = (iii) and that 
in finite-dimensional spaces (iii) = (ii). It is even true that in finite- 
dimensional spaces (ii) = (i), so that all three conditions are equivalent. 
To prove this, let {2;, ---, £y} be an orthonormal basisin U. If we suppose 
that (ii) holds, then, for each e > 0, we may find an no = (e) such that 
|| Ants — Az; || < e for n 2 no and for? = 1, ---, N. It follows that for 
an arbitrary z = Dd: (x, z:;)z; we have 


l (An — Ade |] = IE: (æ, (An — Aas I 


S Xile Gs. - Ail s Vile |, 
and this implies (i). 

It is also easy to prove that if the norm is used to define a distance for 
transformations, then the resulting metric space is complete, that is, 
if || An —Am||—> 0 as n, m — œ, then there is an A such that 
|| An — A || — 0. The proof of this fact is reduced to the corresponding 
fact for vectors. If || An — 4m || — 0, then || Anz — Amz || — O for 
each x, so that we may find a vector corresponding to x, which we may 
denote by Az, say, such that || Anz — Az || — 0. It is clear that the cor- 
respondence from x to Ax is given by a linear transformation A; the 
implication relation (ii) = (i) proved above completes the proof. 

Now that we know what convergence means for linear transformations, 
it behooves us to examine some simple functions of these transformations 
in order to verify their continuity. We assert that || A |], || Az ||, (Az, y), 
Az, A + B, aA, AB, and A* all define continuous functions of all their 
arguments simultaneously. (Observe that the first three are numerical- 
valued functions, the next is vector-valued, and the last four are transforma- 
tion-valued.) The proofs of these statements are all quite easy, and 
similar to each other; to illustrate the ideas we discuss || A ||, Ax, and A*. 

(1) If A, — A, that is, || A, — A || — 0, then, since the relations 

| Aall S 4.-Al +14 ll 
and 
WAI Ssi A — An ll + |] An Il 


imply that 
Ant] Al s IAs A, 


we see that || An || — [| A Il. 
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(2) If A, — A and £a — z, then 
|| Anta — Az || S || Antn — Azn || + [| Atn — Az || — 0, 


so that Ant, — Ax. 
(3) If A, — A, then, for each z and y, 


(An*z, y) = (z, Any) E (Any, t) > (4y, z) 


= (y, A*tx) = (A*z,y), 


whence A,* — A*. 


EXERCISES 


1. A sequence (An) of linear transformations converges to a linear transformation 
A if and only if, for every coordinate system, each entry in the matrix of An con- 
verges, as n — ©, to the corresponding entry in the matrix of A. 


2. For every linear transformation A there exists a sequence (An) of invertible 
linear transformations such that An — A. 


3. If E and F are perpendicular projections, then (EFE)" converges, as n — %, 
to the projection whose range is the intersection of the ranges of E and F. 


4, If A is a linear transformation on a finite-dimensional unitary space, then a 
necessary and sufficient condition that A” — 0 is that all the proper values of 
A be (strictly) less than 1 in absolute value. 


5. Prove that if A is the n-by-n matrix 


010 + 0 
001: 0 
0 0 0 1 
ee l 
nnn n 


then A* converges, as k —> œ, to a projection whose range is one-dimensional; 
find the range. 


6. Prove that det and tr are continuous. 


§ 92. Ergodic theorem 


The routine work is out of the way; we go on to illustrate the general 
theory by considering some very special but quite important convergence 
problems. 
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THEOREM. If U is an isometry on a finite-dimensional inner product 
space, and if M is the subspace of all solutions of Ux = x, then the sequence 


defined by i 
Van =U +U+---+ U) 


converges as n — œ to the perpendicular projection E = Poy. 


PROOF. Let 9t be the range of the linear transformation 1 — U. If 
x= y — Uy isin K, then 


1 
Var = = (y — Uy + Uy ~ Uy +--+ Uy ~ Uy) 


1 
= — (y — U"y), 
n 
so that 
1 1 
ll Vaz |] s a U”y || s „ly + |] U"y I) 


2 
=-]y i. 
n 


This implies that V„x converges to zero when 7 is in N. 

On the other hand, if z is in M, that is, Ur = x, then V,x = z, so that 
in this case Vaz certainly converges to x. 

We shall complete the proof by showing that 21+ = m. (This will 
imply that every vector is a sum of two vectors for which (V,) converges, 
so that (V,) converges everywhere. What we have already proved about 
the limit of (Vn) in M and in N shows that (Vaz) always converges to the 
projection of z in M.) To show that 91+ = m, we observe that x is in the 
orthogonal complement of N if and only if (z, y — Uy) = 0 for all y. 
This in turn implies that 


0 = (z, y — Uy) = (z, y) — (z, Uy) = (z, y) — (U*z, y) 
= (x — U*z, y), 


that is, that x — U*r = x — U~'z is orthogonal to every vector y, so 
that x — U™'z = 0, x = U™'z, or Ux = z. Reading the last computa- 
tion from right to left shows that this necessary condition is also suf- 
ficient; we need only to recall the definition of m to see that m = ott. 
This very ingenious proof, which works with only very slight modifica- 
tions in most of the important infinite-dimensional cases, is due to F. Riesz. 
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§ 93. Power series 


We consider next the so-called Neumann series ))*=-9 A”, where A is 
a linear transformation with norm <1 on a finite-dimensional vector space. 


If we write 
Sp > Dao A’, 
then 
(1) (1 — A)S, = Sp — AS, = 1 — Apt), 


To prove that Sp has a limit as p — œ, we consider (for any two indices 
p and q with p > q) 


I| Sp — Sall S Dkmots l| A” Il S Zineari || A Il” 


Since || A {| < 1, the last written quantity approaches zero as p, q — ©; 
it follows that S, has a limit S as p — œ. To evaluate the limit we observe 
that 1 — A is invertible. (Proof: (1 — A)z = 0 implies that Az =z, 
and, if x = 0, this implies that || Az || = |z || > 14 Il- Il zl, a contra- 
diction.) Hence we may write (1) in the form 


(2) Sp = (1 - APH) (1 = A) =(1- A) (1 - Art); 


since APH! — 0 as p — œ, it follows that S = (1 — A)7. 

As another example of an infinite series of transformations we consider 
the exponential series. For an arbitrary linear transformation A (not 
necessarily with || A || < 1) we write 


1 
Sp R >= Ts. A”: 
: n! 
Since we have 
1 
| Sp — Sall S Dred a | A II", 


and since the right side of this inequality, being a part of the power series 
for exp || A || = e''4!, converges to 0 as p, q —> ©, we see that there is a 
linear transformation S such that Sp —> S. We write S = exp A; we shall 
merely mention some of the elementary properties of this function of A. 

Consideration of the triangular forms of A and of S, shows that the 
proper values of exp A, together with their algebraic multiplicities, are 
equal to the exponentials of the proper values of A. (This argument, as 
well as some of the ones that follow, applies directly to the complex case 
only; the real case has to be deduced via complexification.) From the 
consideration of the triangular form it follows also that the determinant of 
exp A, that is, [Į exp à, where M, +++, Aw are the (not necessarily 
distinct) proper values of A, is the same as exp Ai +++ Aw) = exp 
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(tr A). Since exp ¢ # 0, this shows, incidentally, that exp A is always 
invertible. 

Considered as a function of linear transformations the exponential 
retains many of the simple properties of the ordinary numerical exponential 
function. Let us, for example, take any two commutative linear transforma- 
tions A and B. Since exp (A + B) — exp A exp Bis the limit (as p — œ) 
of the expression 


1 1 1 
dro (A + B)" — Dr-0 ae Lito B" 


= Dhoom ; 27-0 C) AB” ~ D1 -0 2-0 — A™B*, 
we will have proved the Salata rule for exponentials when we have 


proved that this expression converges to zero. (Here (" `) stands for the 
J 


n! 

combinatorial coefficient apt) An easy verification yields the fact 
jin -j)! 

that for k + m < p the product A”B* occurs in both terms of the last 

written expression with coefficients that differ in sign only. The terms 

that do not cancel out are all in the subtrahend and are together equal to 


Dos Ee a” 3; 


mik! 
the summation being extended over those values of m and k that are S p 
and for which m + k > p. Since m +k > pimplies that at least one 
of the two integers m and k is greater than the integer part of (in 
symbols ED, the norm of this remainder is dominated by 


Eio Deg mee 


+ Dio Diy 4 IMB 


= (Seo 14 ~) (Sp glen) 
+ (S05 BI) (Ega) 


= (exp || A |)ap + (exp I| B IDBp, 
where ap — O and lp — 0 asp — œ. 
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Similar methods serve to treat f(A), where f is any function representable 
by a power series, 
kito) = Dormo ant”, 


and where || A || is (strictly) smaller than the radius of convergence of the 
series. We leave it to the reader to verify that the functional calculus 
we are here hinting at is consistent with the functional calculus for normal 
transformations. Thus, for example, exp A as defined above is the same 
linear transformation as is defined by our previous notion of exp A in case 
A is normal. 


EXERCISES 


1. Give an alternative proof of the ergodic theorem, based on the spectra 
theorem for unitary transformations. 


2. Prove that if || 1 — A || < 1, then A is invertible, by considering the formal 
power series expansion of (1 — (1 — 4)) ~€. 


APPENDIX 


HILBERT SPACE 


Probably the most useful and certainly the best developed generalization 
of the theory of finite-dimensional inner product spaces is the theory of 
Hilbert space. Without going into details and entirely without proofs 
we shall now attempt to indicate how this generalization proceeds and 
what are the main difficulties that have to be overcome. 

The definition of Hilbert space is éasy : it is an inner product space satisfy- 
ing one extra condition. That this condition (namely, completeness) is 
automatically satisfied in the finite-dimensional case is proved in ele- 
mentary analysis. In the infinite-dimensional case it may be possible that 
for a sequence (Zn) of vectors || 2, — £m || — 0 as n, m — œ, but still 
there is no vector x for which || 2, — x || — 0; the only effective way of 
ruling out this possibility is explicitly to assume its opposite. In other 
words: a Hilbert space is a complete inner product space. (Sometimes the 
concept of Hilbert space is restricted by additional conditions, whose 
purpose is to limit the size of the space from both above and below. The 
most usual conditions require that the space be infinite-dimensional and 
separable. In recent years, ever since the realization that such additional 
restrictions do not pay for themselves in results, it has become customary 
to use “Hilbert space” for the concept we defined.) 

It is easy to see that the space of polynomials with the inner product 


defined by (z, y) = f x(t)y(t) dt is not complete. In connection with the 


completeness of certain particular Hilbert spaces there is quite an extensive 
mathematical lore. Thus, for instance, the main assertion of the celebrated 
Riesz-Fischer theorem is that a Hilbert space manufactured out of the 


1 
set of all those functions x for which Í |z(t) |? dt < œ (in the sense of 


Lebesgue integration) is a Hilbert space (with formally the same definition 
of inner product as for polynomials). Another popular Hilbert space, 
189 
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reminiscent in its appearance of finite-dimensional coordinate space, is 
the space of all those sequences (fn) of numbers (real or complex, as the 
case may be) for which Jn |En |? converges. 

Using completeness in order to discuss intelligently the convergence of 
some infinite sums, one can proceed for quite some time in building the 
theory of Hilbert spaces without meeting any difficulties due to infinite- 
dimensionality. Thus, for instance, the notions of orthogonality and of 
complete orthonormal sets can be defined in the general case exactly as we 
defined them. Our proof of Bessel’s inequality and of the equivalence of 
the various possible formulations of completeness for orthonormal sets 
have to undergo slight verbal changes only. (The convergence of the 
various infinite sums that enter is an automatic consequence of Bessel’s 
inequality.) Our proof of Schwarz’s inequality is valid, as it stands, in 
the most general case. Finally, the proof of the existence of complete 
orthonormal sets parallels closely the proof in the finite case. In the 
unconstructive proof Zorn’s lemma (or transfinite induction) replaces 
ordinary induction, and even the constructive steps of the Gram-Schmidt 
process are easily carried out. 

In the discussion of manifolds, functionals, and transformations the 
situation becomes uncomfortable if we do not make a concession to the 
topology of Hilbert space. Good generalizations of all our statements for 
the finite-dimensional case can be proved if we consider closed linear 
manifolds, continuous linear functionals, and bounded linear transformations. 
(In a finite-dimensional space every linear manifold is closed, every linear 
functional is continuous, and every linear transformation is bounded.) If, 
however, we do agree to make these concessions, then once more we can 
coast on our finite-dimensional proofs without any change most of the 
time, and with only the insertion of an occasional e the rest of the time. 
Thus once more we obtain that U = 9% @ m+, that m = mtt, and that 
every linear functional of x has the form (z, y); our definitions of self- 
adjoint and of positive transformations still make sense, and all our theo- 
rems about perpendicular projections (as well as their proofs) carry over 
without change. 

The first hint of how things can go wrong comes from the study of orthog- 
onal and unitary transformations. We still call a transformation U 
orthogonal or unitary (according as the space is real or complex) if UU* 
= U*U = 1, and it is still true that such a transformation is isometric, 
that is, that || Uz || = || x || for all z, or, equivalently, (Uz, Uy) = (æ, y) 
for all z and y. It is, however, easy to construct an isometric transforma- 
tion that is not unitary; because of its importance in the construction of 
counterexamples we shall describe one such transformation. We consider 
a Hilbert space in which there is a countable complete orthonormal set, 
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say {xzo, 21, %2, +++}. A unique bounded linear transformation U is 
defined by the conditions Ur, = %n41 for n = 0, 1, 2, --- . This U is 
isometric (U*U = 1), but, since UU*zy = 0, it is not true that UU* = 1. 

It is when we come to spectral theory that the whole flavor of the develop- 
ment changes radically. The definition of proper value as a number A 
for which Az = \z has a non-zero solution still makes sense, and our theo- 
rem about the reality of the proper values of a self-adjoint transformation 
is still true. The notion of proper value loses, however, much of its sig- 
nificance. Proper values are so very useful in the finite-dimensional case 
because they are a handy way of describing the fact that something goes 
wrong with the inverse of A — A, and the only thing that can go wrong is 
that the inverse refuses to exist. Essentially different things can happen 
in the infinite-dimensional case; just to illustrate the possibilities, we 
mention, for example, that the inverse of A — à may exist but be un- 
bounded. That there is no useful generalization of determinant, and 
hence of the characteristic equation, is the least of our worries. The 
whole theory has, in fact, attained its full beauty and maturity only after 
the slavish imitation of such finite-dimensional methods was given up. 

After some appreciation of the fact that the infinite-dimensional case 
has to overcome great difficulties, it comes as a pleasant surprise that the 
spectral theorem for self-adjoint transformations (and, in the complex 
case, even for normal ones) does have a very beautiful and powerful 
generalization. (Although we describe the theorem for bounded trans- 
formations only, there is a large class of unbounded ones for which it is 
valid.) In order to be able to understand the analogy, let us re-examine 
the finite-dimensional case. 

Let A be a self-adjoint linear transformation on a finite-dimensional 
inner product space, and let A = >_;),F; be its spectral form. If M is 
an interval in the real axis, we write E(M) for the sum of all those F}; for 
which \; belongs to M. It is clear that E(M) is a perpendicular projection 
for each M. The following properties of the projection-valued interval- 
function Æ are the crucial ones: if M is the union of a countable collection 
{Mn} of disjoint intervals, then 


(1) E(M) = Lin E(M,), 


and if M is the improper interval consisting of all real numbers, then 
E(M) = 1. The relation between A and Æ is described by the equation 


A = DEC), 


where, of course, {\;} is the degenerate interval consisting of the single 
number A;. Those familiar with Lebesgue-Stieltjes integration will recog- 
nize the last written sum as a typical approximating sum to an integral of 
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the form f dE (A) and will therefore see how one may expect the general- 


ization to go. The algebraic concept of summation is to be replaced by 
the analytic concept of integration; the generalized relation between A and 
E is described by the equation 


(2) A= f \ dE(a). 


Except for this formal alteration, the spectral theorem for self-adjoint 
transformations is true in Hilbert space. We have, of course, to interpret 
correctly the meaning of the limiting operations involved in (1) and (2). 
Once more we are faced with the three possibilities mentioned in § 91. 
They are called uniform, strong, and weak convergence respectively, and 
it turns out that both (1) and (2) may be given the strong interpretation. 
(The reader deduces, of course, from our language that in an infinite-di- 
mensional Hilbert space the three possibilities are indeed distinct.) 

We have seen that the projections F; entering into the spectral form of A 
in the finite-dimensional case are very simple functions of A (§ 82). Since 
the H(M) are obtained from the F; by summation, they also are functions 
of A, and it is quite easy to describe what functions. We write gm (t) = 
1 if ¢ isin M and g(t) = 0 otherwise; then E(M) = gy(A). This fact 
gives the main clue to a possible proof of the general spectral theorem. 
The usual process is to discuss the functional calculus for polynomials, 
and, by limiting processes, to extend it to a class of functions that includes 
all the functions gm. Once this is done, we may define the interval- 
function E by writing E(M) = gy(A); there is no particular difficulty in 
establishing that E and A satisfy (1) and (2). 

After the spectral theorem is proved, it is easy to deduce from it the 
ge neralized versions of our theorems concerning square roots, the functional 
ca leulus, the polar decomposition, and properties of commutativity, and, 
in fact, to answer practically every askable question about bounded normal 
tr ansformations. 

The chief difficulties that remain are the considerations of non-normal 
an d of unbounded transformations. Concerning general non-normal trans- 
for mations, it is quite easy to describe the state of our knowledge; it is 
non-existent. No even unsatisfactory generalization exists for the tri- 

ngular form or for the Jordan canonical form and the theory of elementary 
ivisors. Very different is the situation concerning normal (and par- 
icularly self-adjoint) unbounded transformations. (The reader will 
ympathize with the desire to treat such transformations if he recalls 
hat the first and most important functional operation that most of us 
earn is differentiation.) In this connection we shall barely hint at the 
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main obstacle the theory faces. It is not very difficult to show that 
if a self-adjoint linear transformation is defined for all vectors of Hilbert 
space, then it is bounded. In other words, the first requirement con- 
cerning transformations that we are forced to give up is that they be de- 
fined everywhere. The discussion of the precise domain on which a self- 
adjoint transformation may be defined and of the extent to which this 
domain may be enlarged is the chief new difficulty encountered in the 
study of unbounded transformations. 
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Alternating form, 50 
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functional, 21, 36 
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Cogredient, 83 
Column, 65 
rank, 91 
Complement, 18 
Complete metric space, 176, 189 
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